Quality Control

Build QC your way - flexible checks with a consistent framework.

Why use Time-Stream?

Quality control is essential for any environmental dataset, but QC rules vary between projects, organisations, and sensor types. Time-Stream doesn’t make those decisions for you - instead, it provides a framework for applying common types of QC.

One-liner

QC checks are lightweight, configurable, and explicit:

tf_flagged = tf.qc_check(
   "comparison", "rainfall", compare_to=0, operator="<",
   flag_params=("rainfall_flag", "FLAGGED"),
)

A single call with rich meaning: “I want to QC check when my rainfall data is less than a value of 0, and record the result in the rainfall_flag flag column.”

Key benefits

  • You stay in control Flexibility to choose your thresholds, operators, and ranges.

  • Reproducible QC The same logic can be applied across datasets.

  • Traceable results Checks can add explicit boolean columns or flag values for later analysis.

  • Flexible Combine multiple checks, apply them in sequence, or restrict them to intervals.

In more detail

The qc_check() method applies a single QC check to one column. Each QC check is configurable through parameters specific to that check.

Let’s look at the method in more detail:

TimeFrame.qc_check(check, column_name, observation_interval=None, flag_params=None, **kwargs)[source]

Apply a quality control check to the TimeFrame.

Parameters:
  • check (Union[str, Type[QCCheck], QCCheck]) – The QC check to apply.

  • column_name (str) – The column to perform the check on.

  • observation_interval (tuple[datetime, datetime | None] | None) – Optional time interval to limit the check to.

  • flag_params (tuple[str, str | int] | None) – Tuple of (flag column name [str], flag value [str | int]. If provided, add given flag value to the flag column where the QC check returns True. If not provided, the result of the QC check is returned as a boolean series.

  • **kwargs – Parameters specific to the check type.

Return type:

TimeFrame | Series

Returns:

Result of the QC check, either as a boolean Series or added to the TimeFrame dataframe

Quality control methods

comparison

time_stream.qc.ComparisonCheck

What it does: Compares values against a constant or list using a comparison operator (<, <=, >, >=, ==, !=, is_in).

When to use: Use for value thresholds (e.g. negative rainfall) or matching against lists of known error codes.

Additional args:

compare_to: The value (or list of values for is_in) to compare against. operator: The comparison operator string. flag_na: If True, also flag NaN/null values as failing the check (default: False).

Example usage:

Temperature greater than or equal to 50:

tf = tf.qc_check(
    "comparison", "temperature", compare_to=50, operator=">=",
    flag_params=("flag_column", "FLAGGED")
)
shape: (10, 5)
┌─────────────────────┬─────────────┬───────────────┬──────────────┬─────────────┐
│ timestamp           ┆ temperature ┆ precipitation ┆ sensor_codes ┆ flag_column │
│ ---                 ┆ ---         ┆ ---           ┆ ---          ┆ ---         │
│ datetime[μs]        ┆ i64         ┆ i64           ┆ i64          ┆ i64         │
╞═════════════════════╪═════════════╪═══════════════╪══════════════╪═════════════╡
│ 2023-01-01 00:00:00 ┆ 24          ┆ -3            ┆ 992          ┆ 0           │
│ 2023-01-01 01:00:00 ┆ 22          ┆ 0             ┆ 1            ┆ 0           │
│ 2023-01-01 02:00:00 ┆ -35         ┆ 5             ┆ 1            ┆ 0           │
│ 2023-01-01 03:00:00 ┆ 26          ┆ 10            ┆ 1            ┆ 0           │
│ 2023-01-01 04:00:00 ┆ 24          ┆ 2             ┆ 1            ┆ 0           │
│ 2023-01-01 05:00:00 ┆ 26          ┆ 0             ┆ 1            ┆ 0           │
│ 2023-01-01 06:00:00 ┆ 28          ┆ 0             ┆ 1            ┆ 0           │
│ 2023-01-01 07:00:00 ┆ 50          ┆ 3             ┆ 991          ┆ 1           │
│ 2023-01-01 08:00:00 ┆ 52          ┆ 1             ┆ 995          ┆ 1           │
│ 2023-01-01 09:00:00 ┆ 29          ┆ 0             ┆ 1            ┆ 0           │
└─────────────────────┴─────────────┴───────────────┴──────────────┴─────────────┘

Sensor codes within a list:

error_codes = [991, 992, 993, 994, 995]
tf = tf.qc_check(
    "comparison", "sensor_codes", compare_to=error_codes, operator="is_in",
    flag_params=("flag_column", "FLAGGED")
)
shape: (10, 5)
┌─────────────────────┬─────────────┬───────────────┬──────────────┬─────────────┐
│ timestamp           ┆ temperature ┆ precipitation ┆ sensor_codes ┆ flag_column │
│ ---                 ┆ ---         ┆ ---           ┆ ---          ┆ ---         │
│ datetime[μs]        ┆ i64         ┆ i64           ┆ i64          ┆ i64         │
╞═════════════════════╪═════════════╪═══════════════╪══════════════╪═════════════╡
│ 2023-01-01 00:00:00 ┆ 24          ┆ -3            ┆ 992          ┆ 1           │
│ 2023-01-01 01:00:00 ┆ 22          ┆ 0             ┆ 1            ┆ 0           │
│ 2023-01-01 02:00:00 ┆ -35         ┆ 5             ┆ 1            ┆ 0           │
│ 2023-01-01 03:00:00 ┆ 26          ┆ 10            ┆ 1            ┆ 0           │
│ 2023-01-01 04:00:00 ┆ 24          ┆ 2             ┆ 1            ┆ 0           │
│ 2023-01-01 05:00:00 ┆ 26          ┆ 0             ┆ 1            ┆ 0           │
│ 2023-01-01 06:00:00 ┆ 28          ┆ 0             ┆ 1            ┆ 0           │
│ 2023-01-01 07:00:00 ┆ 50          ┆ 3             ┆ 991          ┆ 1           │
│ 2023-01-01 08:00:00 ┆ 52          ┆ 1             ┆ 995          ┆ 1           │
│ 2023-01-01 09:00:00 ┆ 29          ┆ 0             ┆ 1            ┆ 0           │
└─────────────────────┴─────────────┴───────────────┴──────────────┴─────────────┘

range

time_stream.qc.RangeCheck

What it does: Checks whether values fall inside or outside a min-max interval.

When to use: Use for physical plausibility bounds, such as temperature between -30 and 50°C

Additional args:

min_value: Minimum of the range. max_value: Maximum of the range. closed: Which sides of the interval are inclusive - "both", "left", "right", or "none" (default: "both"). within: Whether to flag values within the range (True, default) or outside it (False).

Example usage:

Temperatures outside of the range -30 to 50:

tf = tf.qc_check(
    "range",
    "temperature",
    min_value=-30,
    max_value=50,
    closed="none",  # Range is not inclusive of min and max value
    within=False,  # Flag values outside of this range
    flag_params=("flag_column", "FLAGGED"),
)
shape: (10, 5)
┌─────────────────────┬─────────────┬───────────────┬──────────────┬─────────────┐
│ timestamp           ┆ temperature ┆ precipitation ┆ sensor_codes ┆ flag_column │
│ ---                 ┆ ---         ┆ ---           ┆ ---          ┆ ---         │
│ datetime[μs]        ┆ i64         ┆ i64           ┆ i64          ┆ i64         │
╞═════════════════════╪═════════════╪═══════════════╪══════════════╪═════════════╡
│ 2023-01-01 00:00:00 ┆ 24          ┆ -3            ┆ 992          ┆ 0           │
│ 2023-01-01 01:00:00 ┆ 22          ┆ 0             ┆ 1            ┆ 0           │
│ 2023-01-01 02:00:00 ┆ -35         ┆ 5             ┆ 1            ┆ 1           │
│ 2023-01-01 03:00:00 ┆ 26          ┆ 10            ┆ 1            ┆ 0           │
│ 2023-01-01 04:00:00 ┆ 24          ┆ 2             ┆ 1            ┆ 0           │
│ 2023-01-01 05:00:00 ┆ 26          ┆ 0             ┆ 1            ┆ 0           │
│ 2023-01-01 06:00:00 ┆ 28          ┆ 0             ┆ 1            ┆ 0           │
│ 2023-01-01 07:00:00 ┆ 50          ┆ 3             ┆ 991          ┆ 1           │
│ 2023-01-01 08:00:00 ┆ 52          ┆ 1             ┆ 995          ┆ 1           │
│ 2023-01-01 09:00:00 ┆ 29          ┆ 0             ┆ 1            ┆ 0           │
└─────────────────────┴─────────────┴───────────────┴──────────────┴─────────────┘

time_range

time_stream.qc.TimeRangeCheck

What it does: Flags rows where the primary time column falls within a given range. Accepts datetime.time, datetime.date, or datetime.datetime bounds.

When to use: Use for known bad periods such as sensor outages or automated calibration times.

Additional args:

min_value: Start of the time range. max_value: End of the time range. closed: Which sides of the interval are inclusive - "both", "left", "right", or "none" (default: "both"). within: Whether to flag values within the range (True, default) or outside it (False).

Example usage:

Flag rainfall values between the hours of 01:00 and 03:00:

tf = tf.qc_check(
    "time_range",
    "precipitation",
    min_value=time(1, 0),
    max_value=time(3, 0),
    flag_params=("flag_column", "FLAGGED")
)
shape: (10, 5)
┌─────────────────────┬─────────────┬───────────────┬──────────────┬─────────────┐
│ timestamp           ┆ temperature ┆ precipitation ┆ sensor_codes ┆ flag_column │
│ ---                 ┆ ---         ┆ ---           ┆ ---          ┆ ---         │
│ datetime[μs]        ┆ i64         ┆ i64           ┆ i64          ┆ i64         │
╞═════════════════════╪═════════════╪═══════════════╪══════════════╪═════════════╡
│ 2023-01-01 00:00:00 ┆ 24          ┆ -3            ┆ 992          ┆ 0           │
│ 2023-01-01 01:00:00 ┆ 22          ┆ 0             ┆ 1            ┆ 1           │
│ 2023-01-01 02:00:00 ┆ -35         ┆ 5             ┆ 1            ┆ 1           │
│ 2023-01-01 03:00:00 ┆ 26          ┆ 10            ┆ 1            ┆ 1           │
│ 2023-01-01 04:00:00 ┆ 24          ┆ 2             ┆ 1            ┆ 0           │
│ 2023-01-01 05:00:00 ┆ 26          ┆ 0             ┆ 1            ┆ 0           │
│ 2023-01-01 06:00:00 ┆ 28          ┆ 0             ┆ 1            ┆ 0           │
│ 2023-01-01 07:00:00 ┆ 50          ┆ 3             ┆ 991          ┆ 0           │
│ 2023-01-01 08:00:00 ┆ 52          ┆ 1             ┆ 995          ┆ 0           │
│ 2023-01-01 09:00:00 ┆ 29          ┆ 0             ┆ 1            ┆ 0           │
└─────────────────────┴─────────────┴───────────────┴──────────────┴─────────────┘

Flag temperature values between 03:30 and 09:30 on the 1st January:

tf = tf.qc_check(
    "time_range",
    "temperature",
    min_value=datetime(2023, 1, 1, 3, 30),
    max_value=datetime(2023, 1, 1, 9, 30),
    flag_params=("flag_column", "FLAGGED"),
)
shape: (10, 5)
┌─────────────────────┬─────────────┬───────────────┬──────────────┬─────────────┐
│ timestamp           ┆ temperature ┆ precipitation ┆ sensor_codes ┆ flag_column │
│ ---                 ┆ ---         ┆ ---           ┆ ---          ┆ ---         │
│ datetime[μs]        ┆ i64         ┆ i64           ┆ i64          ┆ i64         │
╞═════════════════════╪═════════════╪═══════════════╪══════════════╪═════════════╡
│ 2023-01-01 00:00:00 ┆ 24          ┆ -3            ┆ 992          ┆ 0           │
│ 2023-01-01 01:00:00 ┆ 22          ┆ 0             ┆ 1            ┆ 0           │
│ 2023-01-01 02:00:00 ┆ -35         ┆ 5             ┆ 1            ┆ 0           │
│ 2023-01-01 03:00:00 ┆ 26          ┆ 10            ┆ 1            ┆ 0           │
│ 2023-01-01 04:00:00 ┆ 24          ┆ 2             ┆ 1            ┆ 1           │
│ 2023-01-01 05:00:00 ┆ 26          ┆ 0             ┆ 1            ┆ 1           │
│ 2023-01-01 06:00:00 ┆ 28          ┆ 0             ┆ 1            ┆ 1           │
│ 2023-01-01 07:00:00 ┆ 50          ┆ 3             ┆ 991          ┆ 1           │
│ 2023-01-01 08:00:00 ┆ 52          ┆ 1             ┆ 995          ┆ 1           │
│ 2023-01-01 09:00:00 ┆ 29          ┆ 0             ┆ 1            ┆ 1           │
└─────────────────────┴─────────────┴───────────────┴──────────────┴─────────────┘

spike

time_stream.qc.SpikeCheck

What it does: Detects sudden jumps by assessing differences with neighbouring values. A point is flagged when the combined neighbour difference (minus skew) exceeds twice the threshold.

When to use: Use for detecting unrealistic single-point spikes - isolated values that jump sharply compared to their neighbours.

Additional args:

threshold: The spike detection threshold.

Example usage:

Spike check on temperature data:

tf = tf.qc_check(
    "spike", "temperature", threshold=10.0, flag_params=("flag_column", "FLAGGED")
)
shape: (10, 5)
┌─────────────────────┬─────────────┬───────────────┬──────────────┬─────────────┐
│ timestamp           ┆ temperature ┆ precipitation ┆ sensor_codes ┆ flag_column │
│ ---                 ┆ ---         ┆ ---           ┆ ---          ┆ ---         │
│ datetime[μs]        ┆ i64         ┆ i64           ┆ i64          ┆ i64         │
╞═════════════════════╪═════════════╪═══════════════╪══════════════╪═════════════╡
│ 2023-01-01 00:00:00 ┆ 24          ┆ -3            ┆ 992          ┆ 0           │
│ 2023-01-01 01:00:00 ┆ 22          ┆ 0             ┆ 1            ┆ 0           │
│ 2023-01-01 02:00:00 ┆ -35         ┆ 5             ┆ 1            ┆ 1           │
│ 2023-01-01 03:00:00 ┆ 26          ┆ 10            ┆ 1            ┆ 0           │
│ 2023-01-01 04:00:00 ┆ 24          ┆ 2             ┆ 1            ┆ 0           │
│ 2023-01-01 05:00:00 ┆ 26          ┆ 0             ┆ 1            ┆ 0           │
│ 2023-01-01 06:00:00 ┆ 28          ┆ 0             ┆ 1            ┆ 0           │
│ 2023-01-01 07:00:00 ┆ 50          ┆ 3             ┆ 991          ┆ 0           │
│ 2023-01-01 08:00:00 ┆ 52          ┆ 1             ┆ 995          ┆ 0           │
│ 2023-01-01 09:00:00 ┆ 29          ┆ 0             ┆ 1            ┆ 0           │
└─────────────────────┴─────────────┴───────────────┴──────────────┴─────────────┘

Note

The result doesn’t flag the neighbouring high values of 50 and 52. The spike test detects a sudden jump where one value sits between otherwise normal values.

Note

The result returns null for the first and last values; the spike test relies on comparisons with neighbouring values.

flat_line

time_stream.qc.FlatLineCheck

What it does: Detects consecutive repeated (or near-repeated) values in a column.

When to use: Use when a sensor stuck at a fixed value should be flagged as suspect.

Additional args:

min_count: Minimum number of consecutive repeated values required for a flat line (must be at least 2). tolerance: Optional tolerance for near-equality comparison. When set, consecutive values differing by less than or equal to this amount are considered equal (default: None, exact equality). ignore_value: Optional value or list of values that are allowed to repeat without being flagged.

Example usage:

Flag temperature values stuck at the same reading for 3 or more consecutive timesteps:

tf = tf.qc_check(
    "flat_line", "temperature", min_count=3, flag_params=("flag_column", "FLAGGED")
)
shape: (10, 3)
┌─────────────────────┬─────────────┬─────────────┐
│ timestamp           ┆ temperature ┆ flag_column │
│ ---                 ┆ ---         ┆ ---         │
│ datetime[μs]        ┆ f64         ┆ i64         │
╞═════════════════════╪═════════════╪═════════════╡
│ 2023-01-01 00:00:00 ┆ 18.0        ┆ 0           │
│ 2023-01-01 01:00:00 ┆ 20.0        ┆ 1           │
│ 2023-01-01 02:00:00 ┆ 20.0        ┆ 1           │
│ 2023-01-01 03:00:00 ┆ 20.0        ┆ 1           │
│ 2023-01-01 04:00:00 ┆ 20.0        ┆ 1           │
│ 2023-01-01 05:00:00 ┆ 22.0        ┆ 0           │
│ 2023-01-01 06:00:00 ┆ 21.0        ┆ 0           │
│ 2023-01-01 07:00:00 ┆ 0.0         ┆ 1           │
│ 2023-01-01 08:00:00 ┆ 0.0         ┆ 1           │
│ 2023-01-01 09:00:00 ┆ 0.0         ┆ 1           │
└─────────────────────┴─────────────┴─────────────┘

Using ignore_value - suppress flagging when the repeated value is 0.0:

tf = tf.qc_check(
    "flat_line", "temperature", min_count=3, ignore_value=0.0, flag_params=("flag_column", "FLAGGED")
)
shape: (10, 3)
┌─────────────────────┬─────────────┬─────────────┐
│ timestamp           ┆ temperature ┆ flag_column │
│ ---                 ┆ ---         ┆ ---         │
│ datetime[μs]        ┆ f64         ┆ i64         │
╞═════════════════════╪═════════════╪═════════════╡
│ 2023-01-01 00:00:00 ┆ 18.0        ┆ 0           │
│ 2023-01-01 01:00:00 ┆ 20.0        ┆ 1           │
│ 2023-01-01 02:00:00 ┆ 20.0        ┆ 1           │
│ 2023-01-01 03:00:00 ┆ 20.0        ┆ 1           │
│ 2023-01-01 04:00:00 ┆ 20.0        ┆ 1           │
│ 2023-01-01 05:00:00 ┆ 22.0        ┆ 0           │
│ 2023-01-01 06:00:00 ┆ 21.0        ┆ 0           │
│ 2023-01-01 07:00:00 ┆ 0.0         ┆ 0           │
│ 2023-01-01 08:00:00 ┆ 0.0         ┆ 0           │
│ 2023-01-01 09:00:00 ┆ 0.0         ┆ 0           │
└─────────────────────┴─────────────┴─────────────┘

Note

More than one ignore_value can be specified in a list, e.g. [0.0, 20.0]

Using tolerance - flag values that barely change (within 0.01) for 3 or more consecutive readings:

The data below drifts slightly around 20 °C (varying by less than 0.01 between readings) before jumping to a different range. The tolerance parameter catches these near-flat runs that exact equality would miss.

tf = tf.qc_check(
    "flat_line", "temperature", min_count=3, tolerance=0.1, flag_params=("flag_column", "FLAGGED")
)
shape: (10, 3)
┌─────────────────────┬─────────────┬─────────────┐
│ timestamp           ┆ temperature ┆ flag_column │
│ ---                 ┆ ---         ┆ ---         │
│ datetime[μs]        ┆ f64         ┆ i64         │
╞═════════════════════╪═════════════╪═════════════╡
│ 2023-01-01 00:00:00 ┆ 18.0        ┆ 0           │
│ 2023-01-01 01:00:00 ┆ 20.0        ┆ 1           │
│ 2023-01-01 02:00:00 ┆ 20.005      ┆ 1           │
│ 2023-01-01 03:00:00 ┆ 20.001      ┆ 1           │
│ 2023-01-01 04:00:00 ┆ 19.991      ┆ 1           │
│ 2023-01-01 05:00:00 ┆ 22.0        ┆ 0           │
│ 2023-01-01 06:00:00 ┆ 20.99       ┆ 1           │
│ 2023-01-01 07:00:00 ┆ 21.003      ┆ 1           │
│ 2023-01-01 08:00:00 ┆ 21.009      ┆ 1           │
│ 2023-01-01 09:00:00 ┆ 20.997      ┆ 1           │
└─────────────────────┴─────────────┴─────────────┘

Observation interval

Specify an observation interval to restrict the QC check to a specific time window. This is useful when:

  • You only want to QC a specific period of observations (e.g. summer 2024).

  • You need to re-run checks on recent data without reprocessing the full archive.

  • You want to exclude known bad periods (e.g. sensor maintenance) from checks.

Flag Parameters

The result of a QC check can be consumed in one of two ways, selected via the flag_params argument:

  • Boolean series (default, flag_params omitted) - qc_check returns a Polars boolean Series of the same length as the TimeFrame, with True marking the rows that failed the check. Useful for chaining into custom expressions or feeding into add_flag() manually.

  • Flag column update (flag_params=(flag_column_name, flag_value)) - qc_check adds the given flag value to the named flag column on each failing row and returns a new TimeFrame. The flag column must already exist (see Flagging).

The examples below use the flag-column style. Each sets up a bitwise flag system with a single FLAGGED flag and calls init_flag_column() before running the check:

tf.register_flag_system("qc", {"FLAGGED": 1})
tf.init_flag_column("qc", "flag_column")

API reference

qc

Time Series Quality Control (QC) Module

qc_check(check, column_name[, ...])

Apply a quality control check to the TimeFrame.