Quality Control

Build QC your way - flexible checks with a consistent framework.

Why use Time-Stream?

Quality control is essential for any environmental dataset, but QC rules vary between projects, organisations, and sensor types. Time-Stream doesn’t make those decisions for you - instead, it provides a framework for applying common types of QC.

One-liner

QC checks are lightweight, configurable, and explicit:

tf_flagged = tf.qc_check(
   "comparison", "rainfall", compare_to=0, operator="<", into="rainfall_flag"
)

A single call with rich meaning: “I want to QC check when my rainfall data is less than a value of 0, with results saved to a column named rainfall_flag.

Key benefits

  • You stay in control Flexibility to choose your thresholds, operators, and ranges.

  • Reproducible QC The same logic can be applied across datasets.

  • Traceable results Checks can add explicit boolean columns or flag values for later analysis.

  • Flexible Combine multiple checks, apply them in sequence, or restrict them to intervals.

In more detail

The qc_check() method applies a single QC check to one column. It can return a boolean mask (for filtering) or update the TimeFrame with a new column containing the results of the QC check. Each QC check is configurable through parameters specific to that check - see examples below.

Available checks

  • "comparison" - compare values against a constant or list using operators: <, <=, >, >=, ==, !=, is_in

    Use for value thresholds or list of error codes.

  • "range" - check if values lie inside/outside a min–max interval.

    Use for physical plausibility bounds (e.g. temperature between −50 and 50 °C).

  • "time_range" - flag data between specific time ranges.

    Use for known bad periods such as sensor outages or calibration times.

  • "spike" - detect sudden jumps using neighbour differences.

    Use for unrealistic single-point spikes.

Examples:

  1. Temperature greater than or equal to 50

tf = tf.qc_check(
    "comparison", "temperature", compare_to=50, operator=">=", into=True
)
<time_stream.TimeFrame> Size (estimated): 370.00 B
Time properties:
    Time column                   : timestamp  [2023-01-01 00:00:00, ..., 2023-01-01 09:00:00]
    Type                          : Datetime(time_unit='us', time_zone=None)
    Resolution                    : PT0.000001S
    Offset                        : None
    Alignment                     : PT0.000001S
    Periodicity                   : PT0.000001S
    Anchor                        : TimeAnchor.START
Columns:
    temperature                   : Int64  80.00 B  [24, ..., 29]
    precipitation                 : Int64  80.00 B  [-3, ..., 0]
    sensor_codes                  : Int64  80.00 B  [992, ..., 1]
    __qc__temperature__comparison : Boolean  2.00 B  [False, ..., False]
  1. Sensor codes within a list

error_codes = [991, 992, 993, 994, 995]
tf = tf.qc_check(
    "comparison", "sensor_codes", compare_to=error_codes, operator="is_in", into=True
)
<time_stream.TimeFrame> Size (estimated): 370.00 B
Time properties:
    Time column                    : timestamp  [2023-01-01 00:00:00, ..., 2023-01-01 09:00:00]
    Type                           : Datetime(time_unit='us', time_zone=None)
    Resolution                     : PT0.000001S
    Offset                         : None
    Alignment                      : PT0.000001S
    Periodicity                    : PT0.000001S
    Anchor                         : TimeAnchor.START
Columns:
    temperature                    : Int64  80.00 B  [24, ..., 29]
    precipitation                  : Int64  80.00 B  [-3, ..., 0]
    sensor_codes                   : Int64  80.00 B  [992, ..., 1]
    __qc__sensor_codes__comparison : Boolean  2.00 B  [True, ..., False]
  1. Temperatures outside of min and max range (below -30 and above 50)

tf = tf.qc_check(
    "range",
    "temperature",
    min_value=-10,
    max_value=50,
    closed="none",  # Range is not inclusive of min and max value
    within=False,  # Flag values outside of this range
    into=True,
)
<time_stream.TimeFrame> Size (estimated): 370.00 B
Time properties:
    Time column              : timestamp  [2023-01-01 00:00:00, ..., 2023-01-01 09:00:00]
    Type                     : Datetime(time_unit='us', time_zone=None)
    Resolution               : PT0.000001S
    Offset                   : None
    Alignment                : PT0.000001S
    Periodicity              : PT0.000001S
    Anchor                   : TimeAnchor.START
Columns:
    temperature              : Int64  80.00 B  [24, ..., 29]
    precipitation            : Int64  80.00 B  [-3, ..., 0]
    sensor_codes             : Int64  80.00 B  [992, ..., 1]
    __qc__temperature__range : Boolean  2.00 B  [False, ..., False]
  1. Flag rainfall values between the hours of 01:00 and 03:00

tf = tf.qc_check(
    "time_range",
    "precipitation",
    min_value=time(1, 0),
    max_value=time(3, 0),
    into=True
)
<time_stream.TimeFrame> Size (estimated): 370.00 B
Time properties:
    Time column                     : timestamp  [2023-01-01 00:00:00, ..., 2023-01-01 09:00:00]
    Type                            : Datetime(time_unit='us', time_zone=None)
    Resolution                      : PT0.000001S
    Offset                          : None
    Alignment                       : PT0.000001S
    Periodicity                     : PT0.000001S
    Anchor                          : TimeAnchor.START
Columns:
    temperature                     : Int64  80.00 B  [24, ..., 29]
    precipitation                   : Int64  80.00 B  [-3, ..., 0]
    sensor_codes                    : Int64  80.00 B  [992, ..., 1]
    __qc__precipitation__time_range : Boolean  2.00 B  [False, ..., False]
  1. Flag temperature values between 03:30 on the 1st January and 09:30 on the 1st January

tf = tf.qc_check(
    "time_range",
    "temperature",
    min_value=datetime(2023, 1, 1, 3, 30),
    max_value=datetime(2023, 1, 1, 9, 30),
    into=True,
)
<time_stream.TimeFrame> Size (estimated): 370.00 B
Time properties:
    Time column                   : timestamp  [2023-01-01 00:00:00, ..., 2023-01-01 09:00:00]
    Type                          : Datetime(time_unit='us', time_zone=None)
    Resolution                    : PT0.000001S
    Offset                        : None
    Alignment                     : PT0.000001S
    Periodicity                   : PT0.000001S
    Anchor                        : TimeAnchor.START
Columns:
    temperature                   : Int64  80.00 B  [24, ..., 29]
    precipitation                 : Int64  80.00 B  [-3, ..., 0]
    sensor_codes                  : Int64  80.00 B  [992, ..., 1]
    __qc__temperature__time_range : Boolean  2.00 B  [False, ..., True]
  1. Spike check on temperature data

tf = tf.qc_check(
    "spike", "temperature", threshold=10.0, into=True
)
<time_stream.TimeFrame> Size (estimated): 372.00 B
Time properties:
    Time column              : timestamp  [2023-01-01 00:00:00, ..., 2023-01-01 09:00:00]
    Type                     : Datetime(time_unit='us', time_zone=None)
    Resolution               : PT0.000001S
    Offset                   : None
    Alignment                : PT0.000001S
    Periodicity              : PT0.000001S
    Anchor                   : TimeAnchor.START
Columns:
    temperature              : Int64  80.00 B  [24, ..., 29]
    precipitation            : Int64  80.00 B  [-3, ..., 0]
    sensor_codes             : Int64  80.00 B  [992, ..., 1]
    __qc__temperature__spike : Boolean  4.00 B  [None, ..., None]

Note

The result doesn’t flag the neighbouring high values of 50, 52. The spike test is really for detecting a sudden jump with one value between “normal” values.

Note

The result return null for the first and last values; the spike test relies of comparisons of neighbouring values.

Observation interval

Specify an observation interval to restrict the QC check to a specific time window. This is useful when:

  • You only want to QC a specific period of observations (e.g. summer 2024).

  • You need to re-run checks on recent data without reprocessing the full archive.

  • You want to exclude known bad periods (e.g. sensor maintenance) from checks.

Into

The into argument controls what you get back:

  • into=False → return a boolean Series (mask of failed rows).

  • into=True → add a new boolean column with an automatic name.

  • into="my_column" → add a new boolean column with a custom name.

Note

If a column name already exists, Time-Stream auto-suffixes it to avoid overwriting.