Quality Control¶
Build QC your way - flexible checks with a consistent framework.
Why use Time-Stream?¶
Quality control is essential for any environmental dataset, but QC rules vary between projects, organisations, and sensor types. Time-Stream doesn’t make those decisions for you - instead, it provides a framework for applying common types of QC.
One-liner¶
QC checks are lightweight, configurable, and explicit:
tf_flagged = tf.qc_check(
"comparison", "rainfall", compare_to=0, operator="<", into="rainfall_flag"
)
A single call with rich meaning: “I want to QC check when my rainfall data is less than a value of 0, with results saved to a column named rainfall_flag.
Key benefits¶
You stay in control Flexibility to choose your thresholds, operators, and ranges.
Reproducible QC The same logic can be applied across datasets.
Traceable results Checks can add explicit boolean columns or flag values for later analysis.
Flexible Combine multiple checks, apply them in sequence, or restrict them to intervals.
In more detail¶
The qc_check() method applies a single QC check to one column.
It can return a boolean mask (for filtering) or update the TimeFrame with a new column containing the results
of the QC check. Each QC check is configurable through parameters specific to that check - see examples below.
Comparison check¶
Compare values against a constant or list using operators: <, <=, >, >=, ==, !=, is_in.
Use for value thresholds or lists of error codes.
Example: Temperature greater than or equal to 50:
tf = tf.qc_check(
"comparison", "temperature", compare_to=50, operator=">=", into=True
)
shape: (10, 5)
┌─────────────────────┬─────────────┬───────────────┬──────────────┬───────────────────────────────┐
│ timestamp ┆ temperature ┆ precipitation ┆ sensor_codes ┆ __qc__temperature__comparison │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ datetime[μs] ┆ i64 ┆ i64 ┆ i64 ┆ bool │
╞═════════════════════╪═════════════╪═══════════════╪══════════════╪═══════════════════════════════╡
│ 2023-01-01 00:00:00 ┆ 24 ┆ -3 ┆ 992 ┆ false │
│ 2023-01-01 01:00:00 ┆ 22 ┆ 0 ┆ 1 ┆ false │
│ 2023-01-01 02:00:00 ┆ -35 ┆ 5 ┆ 1 ┆ false │
│ 2023-01-01 03:00:00 ┆ 26 ┆ 10 ┆ 1 ┆ false │
│ 2023-01-01 04:00:00 ┆ 24 ┆ 2 ┆ 1 ┆ false │
│ 2023-01-01 05:00:00 ┆ 26 ┆ 0 ┆ 1 ┆ false │
│ 2023-01-01 06:00:00 ┆ 28 ┆ 0 ┆ 1 ┆ false │
│ 2023-01-01 07:00:00 ┆ 50 ┆ 3 ┆ 991 ┆ true │
│ 2023-01-01 08:00:00 ┆ 52 ┆ 1 ┆ 995 ┆ true │
│ 2023-01-01 09:00:00 ┆ 29 ┆ 0 ┆ 1 ┆ false │
└─────────────────────┴─────────────┴───────────────┴──────────────┴───────────────────────────────┘
Example: Sensor codes within a list:
error_codes = [991, 992, 993, 994, 995]
tf = tf.qc_check(
"comparison", "sensor_codes", compare_to=error_codes, operator="is_in", into=True
)
shape: (10, 5)
┌─────────────────────┬─────────────┬───────────────┬──────────────┬───────────────────────────────┐
│ timestamp ┆ temperature ┆ precipitation ┆ sensor_codes ┆ __qc__sensor_codes__compariso │
│ --- ┆ --- ┆ --- ┆ --- ┆ n │
│ datetime[μs] ┆ i64 ┆ i64 ┆ i64 ┆ --- │
│ ┆ ┆ ┆ ┆ bool │
╞═════════════════════╪═════════════╪═══════════════╪══════════════╪═══════════════════════════════╡
│ 2023-01-01 00:00:00 ┆ 24 ┆ -3 ┆ 992 ┆ true │
│ 2023-01-01 01:00:00 ┆ 22 ┆ 0 ┆ 1 ┆ false │
│ 2023-01-01 02:00:00 ┆ -35 ┆ 5 ┆ 1 ┆ false │
│ 2023-01-01 03:00:00 ┆ 26 ┆ 10 ┆ 1 ┆ false │
│ 2023-01-01 04:00:00 ┆ 24 ┆ 2 ┆ 1 ┆ false │
│ 2023-01-01 05:00:00 ┆ 26 ┆ 0 ┆ 1 ┆ false │
│ 2023-01-01 06:00:00 ┆ 28 ┆ 0 ┆ 1 ┆ false │
│ 2023-01-01 07:00:00 ┆ 50 ┆ 3 ┆ 991 ┆ true │
│ 2023-01-01 08:00:00 ┆ 52 ┆ 1 ┆ 995 ┆ true │
│ 2023-01-01 09:00:00 ┆ 29 ┆ 0 ┆ 1 ┆ false │
└─────────────────────┴─────────────┴───────────────┴──────────────┴───────────────────────────────┘
Range check¶
Check if values lie inside or outside a min-max interval.
Use for physical plausibility bounds (e.g. temperature between -30 and 50 °C).
Example: Temperatures outside of the range -30 to 50:
tf = tf.qc_check(
"range",
"temperature",
min_value=-30,
max_value=50,
closed="none", # Range is not inclusive of min and max value
within=False, # Flag values outside of this range
into=True,
)
shape: (10, 5)
┌─────────────────────┬─────────────┬───────────────┬──────────────┬──────────────────────────┐
│ timestamp ┆ temperature ┆ precipitation ┆ sensor_codes ┆ __qc__temperature__range │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ datetime[μs] ┆ i64 ┆ i64 ┆ i64 ┆ bool │
╞═════════════════════╪═════════════╪═══════════════╪══════════════╪══════════════════════════╡
│ 2023-01-01 00:00:00 ┆ 24 ┆ -3 ┆ 992 ┆ false │
│ 2023-01-01 01:00:00 ┆ 22 ┆ 0 ┆ 1 ┆ false │
│ 2023-01-01 02:00:00 ┆ -35 ┆ 5 ┆ 1 ┆ true │
│ 2023-01-01 03:00:00 ┆ 26 ┆ 10 ┆ 1 ┆ false │
│ 2023-01-01 04:00:00 ┆ 24 ┆ 2 ┆ 1 ┆ false │
│ 2023-01-01 05:00:00 ┆ 26 ┆ 0 ┆ 1 ┆ false │
│ 2023-01-01 06:00:00 ┆ 28 ┆ 0 ┆ 1 ┆ false │
│ 2023-01-01 07:00:00 ┆ 50 ┆ 3 ┆ 991 ┆ true │
│ 2023-01-01 08:00:00 ┆ 52 ┆ 1 ┆ 995 ┆ true │
│ 2023-01-01 09:00:00 ┆ 29 ┆ 0 ┆ 1 ┆ false │
└─────────────────────┴─────────────┴───────────────┴──────────────┴──────────────────────────┘
Time range check¶
Flag data between specific time ranges.
Use for known bad periods such as sensor outages or calibration times.
Example: Flag rainfall values between the hours of 01:00 and 03:00:
tf = tf.qc_check(
"time_range",
"precipitation",
min_value=time(1, 0),
max_value=time(3, 0),
into=True
)
shape: (10, 5)
┌─────────────────────┬─────────────┬───────────────┬──────────────┬───────────────────────────────┐
│ timestamp ┆ temperature ┆ precipitation ┆ sensor_codes ┆ __qc__precipitation__time_ran │
│ --- ┆ --- ┆ --- ┆ --- ┆ g… │
│ datetime[μs] ┆ i64 ┆ i64 ┆ i64 ┆ --- │
│ ┆ ┆ ┆ ┆ bool │
╞═════════════════════╪═════════════╪═══════════════╪══════════════╪═══════════════════════════════╡
│ 2023-01-01 00:00:00 ┆ 24 ┆ -3 ┆ 992 ┆ false │
│ 2023-01-01 01:00:00 ┆ 22 ┆ 0 ┆ 1 ┆ true │
│ 2023-01-01 02:00:00 ┆ -35 ┆ 5 ┆ 1 ┆ true │
│ 2023-01-01 03:00:00 ┆ 26 ┆ 10 ┆ 1 ┆ true │
│ 2023-01-01 04:00:00 ┆ 24 ┆ 2 ┆ 1 ┆ false │
│ 2023-01-01 05:00:00 ┆ 26 ┆ 0 ┆ 1 ┆ false │
│ 2023-01-01 06:00:00 ┆ 28 ┆ 0 ┆ 1 ┆ false │
│ 2023-01-01 07:00:00 ┆ 50 ┆ 3 ┆ 991 ┆ false │
│ 2023-01-01 08:00:00 ┆ 52 ┆ 1 ┆ 995 ┆ false │
│ 2023-01-01 09:00:00 ┆ 29 ┆ 0 ┆ 1 ┆ false │
└─────────────────────┴─────────────┴───────────────┴──────────────┴───────────────────────────────┘
Example: Flag temperature values between 03:30 and 09:30 on the 1st January:
tf = tf.qc_check(
"time_range",
"temperature",
min_value=datetime(2023, 1, 1, 3, 30),
max_value=datetime(2023, 1, 1, 9, 30),
into=True,
)
shape: (10, 5)
┌─────────────────────┬─────────────┬───────────────┬──────────────┬───────────────────────────────┐
│ timestamp ┆ temperature ┆ precipitation ┆ sensor_codes ┆ __qc__temperature__time_range │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ datetime[μs] ┆ i64 ┆ i64 ┆ i64 ┆ bool │
╞═════════════════════╪═════════════╪═══════════════╪══════════════╪═══════════════════════════════╡
│ 2023-01-01 00:00:00 ┆ 24 ┆ -3 ┆ 992 ┆ false │
│ 2023-01-01 01:00:00 ┆ 22 ┆ 0 ┆ 1 ┆ false │
│ 2023-01-01 02:00:00 ┆ -35 ┆ 5 ┆ 1 ┆ false │
│ 2023-01-01 03:00:00 ┆ 26 ┆ 10 ┆ 1 ┆ false │
│ 2023-01-01 04:00:00 ┆ 24 ┆ 2 ┆ 1 ┆ true │
│ 2023-01-01 05:00:00 ┆ 26 ┆ 0 ┆ 1 ┆ true │
│ 2023-01-01 06:00:00 ┆ 28 ┆ 0 ┆ 1 ┆ true │
│ 2023-01-01 07:00:00 ┆ 50 ┆ 3 ┆ 991 ┆ true │
│ 2023-01-01 08:00:00 ┆ 52 ┆ 1 ┆ 995 ┆ true │
│ 2023-01-01 09:00:00 ┆ 29 ┆ 0 ┆ 1 ┆ true │
└─────────────────────┴─────────────┴───────────────┴──────────────┴───────────────────────────────┘
Spike check¶
Detect sudden jumps using neighbour differences.
Use for unrealistic single-point spikes.
Example: Spike check on temperature data:
tf = tf.qc_check(
"spike", "temperature", threshold=10.0, into=True
)
shape: (10, 5)
┌─────────────────────┬─────────────┬───────────────┬──────────────┬──────────────────────────┐
│ timestamp ┆ temperature ┆ precipitation ┆ sensor_codes ┆ __qc__temperature__spike │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ datetime[μs] ┆ i64 ┆ i64 ┆ i64 ┆ bool │
╞═════════════════════╪═════════════╪═══════════════╪══════════════╪══════════════════════════╡
│ 2023-01-01 00:00:00 ┆ 24 ┆ -3 ┆ 992 ┆ null │
│ 2023-01-01 01:00:00 ┆ 22 ┆ 0 ┆ 1 ┆ false │
│ 2023-01-01 02:00:00 ┆ -35 ┆ 5 ┆ 1 ┆ true │
│ 2023-01-01 03:00:00 ┆ 26 ┆ 10 ┆ 1 ┆ false │
│ 2023-01-01 04:00:00 ┆ 24 ┆ 2 ┆ 1 ┆ false │
│ 2023-01-01 05:00:00 ┆ 26 ┆ 0 ┆ 1 ┆ false │
│ 2023-01-01 06:00:00 ┆ 28 ┆ 0 ┆ 1 ┆ false │
│ 2023-01-01 07:00:00 ┆ 50 ┆ 3 ┆ 991 ┆ false │
│ 2023-01-01 08:00:00 ┆ 52 ┆ 1 ┆ 995 ┆ false │
│ 2023-01-01 09:00:00 ┆ 29 ┆ 0 ┆ 1 ┆ null │
└─────────────────────┴─────────────┴───────────────┴──────────────┴──────────────────────────┘
Note
The result doesn’t flag the neighbouring high values of 50 and 52. The spike test detects a sudden jump where one value sits between otherwise normal values.
Note
The result returns null for the first and last values; the spike test relies on comparisons with
neighbouring values.
Flat line check¶
Detect consecutive repeated (or near-repeated) values.
Use when a sensor stuck at a fixed value should be flagged as suspect.
Example: Flag temperature values stuck at the same reading for 3 or more consecutive timesteps:
tf = tf.qc_check(
"flat_line", "temperature", min_count=3, into=True
)
shape: (10, 3)
┌─────────────────────┬─────────────┬──────────────────────────────┐
│ timestamp ┆ temperature ┆ __qc__temperature__flat_line │
│ --- ┆ --- ┆ --- │
│ datetime[μs] ┆ f64 ┆ bool │
╞═════════════════════╪═════════════╪══════════════════════════════╡
│ 2023-01-01 00:00:00 ┆ 18.0 ┆ false │
│ 2023-01-01 01:00:00 ┆ 20.0 ┆ true │
│ 2023-01-01 02:00:00 ┆ 20.0 ┆ true │
│ 2023-01-01 03:00:00 ┆ 20.0 ┆ true │
│ 2023-01-01 04:00:00 ┆ 20.0 ┆ true │
│ 2023-01-01 05:00:00 ┆ 22.0 ┆ false │
│ 2023-01-01 06:00:00 ┆ 21.0 ┆ false │
│ 2023-01-01 07:00:00 ┆ 0.0 ┆ true │
│ 2023-01-01 08:00:00 ┆ 0.0 ┆ true │
│ 2023-01-01 09:00:00 ┆ 0.0 ┆ true │
└─────────────────────┴─────────────┴──────────────────────────────┘
Example: Using ignore_value - suppress flagging when the repeated value is 0.0:
tf = tf.qc_check(
"flat_line", "temperature", min_count=3, ignore_value=0.0, into=True
)
shape: (10, 3)
┌─────────────────────┬─────────────┬──────────────────────────────┐
│ timestamp ┆ temperature ┆ __qc__temperature__flat_line │
│ --- ┆ --- ┆ --- │
│ datetime[μs] ┆ f64 ┆ bool │
╞═════════════════════╪═════════════╪══════════════════════════════╡
│ 2023-01-01 00:00:00 ┆ 18.0 ┆ false │
│ 2023-01-01 01:00:00 ┆ 20.0 ┆ true │
│ 2023-01-01 02:00:00 ┆ 20.0 ┆ true │
│ 2023-01-01 03:00:00 ┆ 20.0 ┆ true │
│ 2023-01-01 04:00:00 ┆ 20.0 ┆ true │
│ 2023-01-01 05:00:00 ┆ 22.0 ┆ false │
│ 2023-01-01 06:00:00 ┆ 21.0 ┆ false │
│ 2023-01-01 07:00:00 ┆ 0.0 ┆ false │
│ 2023-01-01 08:00:00 ┆ 0.0 ┆ false │
│ 2023-01-01 09:00:00 ┆ 0.0 ┆ false │
└─────────────────────┴─────────────┴──────────────────────────────┘
Note
More than one ignore_value can be specified in a list, e.g. [0.0, 20.0]
Example: Using tolerance - flag values that barely change (within 0.01) for 3 or more consecutive readings:
The data below drifts slightly around 20 °C (varying by less than 0.01 between readings) before jumping
to a different range. The tolerance parameter catches these near-flat runs that exact equality would miss.
tf = tf.qc_check(
"flat_line", "temperature", min_count=3, tolerance=0.1, into=True
)
shape: (10, 3)
┌─────────────────────┬─────────────┬──────────────────────────────┐
│ timestamp ┆ temperature ┆ __qc__temperature__flat_line │
│ --- ┆ --- ┆ --- │
│ datetime[μs] ┆ f64 ┆ bool │
╞═════════════════════╪═════════════╪══════════════════════════════╡
│ 2023-01-01 00:00:00 ┆ 18.0 ┆ false │
│ 2023-01-01 01:00:00 ┆ 20.0 ┆ true │
│ 2023-01-01 02:00:00 ┆ 20.005 ┆ true │
│ 2023-01-01 03:00:00 ┆ 20.001 ┆ true │
│ 2023-01-01 04:00:00 ┆ 19.991 ┆ true │
│ 2023-01-01 05:00:00 ┆ 22.0 ┆ false │
│ 2023-01-01 06:00:00 ┆ 20.99 ┆ true │
│ 2023-01-01 07:00:00 ┆ 21.003 ┆ true │
│ 2023-01-01 08:00:00 ┆ 21.009 ┆ true │
│ 2023-01-01 09:00:00 ┆ 20.997 ┆ true │
└─────────────────────┴─────────────┴──────────────────────────────┘
Additional parameters¶
Observation interval¶
Specify an observation interval to restrict the QC check to a specific time window. This is useful when:
You only want to QC a specific period of observations (e.g. summer 2024).
You need to re-run checks on recent data without reprocessing the full archive.
You want to exclude known bad periods (e.g. sensor maintenance) from checks.
Into¶
The into argument controls what you get back:
into=False→ return a boolean Series (mask of failed rows).into=True→ add a new boolean column with an automatic name.into="my_column"→ add a new boolean column with a custom name.
Note
If a column name already exists, Time-Stream auto-suffixes it to avoid overwriting.