Quality Control¶
Build QC your way - flexible checks with a consistent framework.
Why use Time-Stream?¶
Quality control is essential for any environmental dataset, but QC rules vary between projects, organisations, and sensor types. Time-Stream doesn’t make those decisions for you - instead, it provides a framework for applying common types of QC.
One-liner¶
QC checks are lightweight, configurable, and explicit:
tf_flagged = tf.qc_check(
"comparison", "rainfall", compare_to=0, operator="<", into="rainfall_flag"
)
A single call with rich meaning: “I want to QC check when my rainfall data is less than a value of 0, with results saved to a column named rainfall_flag.
Key benefits¶
You stay in control Flexibility to choose your thresholds, operators, and ranges.
Reproducible QC The same logic can be applied across datasets.
Traceable results Checks can add explicit boolean columns or flag values for later analysis.
Flexible Combine multiple checks, apply them in sequence, or restrict them to intervals.
In more detail¶
The qc_check()
method applies a single QC check to one column.
It can return a boolean mask (for filtering) or update the TimeFrame with a new column containing the results
of the QC check. Each QC check is configurable through parameters specific to that check - see examples below.
Available checks¶
"comparison"
- compare values against a constant or list using operators:<, <=, >, >=, ==, !=, is_in
Use for value thresholds or list of error codes.
"range"
- check if values lie inside/outside a min–max interval.Use for physical plausibility bounds (e.g. temperature between −50 and 50 °C).
"time_range"
- flag data between specific time ranges.Use for known bad periods such as sensor outages or calibration times.
"spike"
- detect sudden jumps using neighbour differences.Use for unrealistic single-point spikes.
Examples:
Temperature greater than or equal to 50
tf = tf.qc_check(
"comparison", "temperature", compare_to=50, operator=">=", into=True
)
shape: (10, 5)
┌─────────────────────┬─────────────┬───────────────┬──────────────┬───────────────────────────────┐
│ timestamp ┆ temperature ┆ precipitation ┆ sensor_codes ┆ __qc__temperature__comparison │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ datetime[μs] ┆ i64 ┆ i64 ┆ i64 ┆ bool │
╞═════════════════════╪═════════════╪═══════════════╪══════════════╪═══════════════════════════════╡
│ 2023-01-01 00:00:00 ┆ 24 ┆ -3 ┆ 992 ┆ false │
│ 2023-01-01 01:00:00 ┆ 22 ┆ 0 ┆ 1 ┆ false │
│ 2023-01-01 02:00:00 ┆ -35 ┆ 5 ┆ 1 ┆ false │
│ 2023-01-01 03:00:00 ┆ 26 ┆ 10 ┆ 1 ┆ false │
│ 2023-01-01 04:00:00 ┆ 24 ┆ 2 ┆ 1 ┆ false │
│ 2023-01-01 05:00:00 ┆ 26 ┆ 0 ┆ 1 ┆ false │
│ 2023-01-01 06:00:00 ┆ 28 ┆ 0 ┆ 1 ┆ false │
│ 2023-01-01 07:00:00 ┆ 50 ┆ 3 ┆ 991 ┆ true │
│ 2023-01-01 08:00:00 ┆ 52 ┆ 1 ┆ 995 ┆ true │
│ 2023-01-01 09:00:00 ┆ 29 ┆ 0 ┆ 1 ┆ false │
└─────────────────────┴─────────────┴───────────────┴──────────────┴───────────────────────────────┘
Sensor codes within a list
error_codes = [991, 992, 993, 994, 995]
tf = tf.qc_check(
"comparison", "sensor_codes", compare_to=error_codes, operator="is_in", into=True
)
shape: (10, 5)
┌─────────────────────┬─────────────┬───────────────┬──────────────┬───────────────────────────────┐
│ timestamp ┆ temperature ┆ precipitation ┆ sensor_codes ┆ __qc__sensor_codes__compariso │
│ --- ┆ --- ┆ --- ┆ --- ┆ n │
│ datetime[μs] ┆ i64 ┆ i64 ┆ i64 ┆ --- │
│ ┆ ┆ ┆ ┆ bool │
╞═════════════════════╪═════════════╪═══════════════╪══════════════╪═══════════════════════════════╡
│ 2023-01-01 00:00:00 ┆ 24 ┆ -3 ┆ 992 ┆ true │
│ 2023-01-01 01:00:00 ┆ 22 ┆ 0 ┆ 1 ┆ false │
│ 2023-01-01 02:00:00 ┆ -35 ┆ 5 ┆ 1 ┆ false │
│ 2023-01-01 03:00:00 ┆ 26 ┆ 10 ┆ 1 ┆ false │
│ 2023-01-01 04:00:00 ┆ 24 ┆ 2 ┆ 1 ┆ false │
│ 2023-01-01 05:00:00 ┆ 26 ┆ 0 ┆ 1 ┆ false │
│ 2023-01-01 06:00:00 ┆ 28 ┆ 0 ┆ 1 ┆ false │
│ 2023-01-01 07:00:00 ┆ 50 ┆ 3 ┆ 991 ┆ true │
│ 2023-01-01 08:00:00 ┆ 52 ┆ 1 ┆ 995 ┆ true │
│ 2023-01-01 09:00:00 ┆ 29 ┆ 0 ┆ 1 ┆ false │
└─────────────────────┴─────────────┴───────────────┴──────────────┴───────────────────────────────┘
Temperatures outside of min and max range (below -30 and above 50)
tf = tf.qc_check(
"range",
"temperature",
min_value=-10,
max_value=50,
closed="none", # Range is not inclusive of min and max value
within=False, # Flag values outside of this range
into=True,
)
shape: (10, 5)
┌─────────────────────┬─────────────┬───────────────┬──────────────┬──────────────────────────┐
│ timestamp ┆ temperature ┆ precipitation ┆ sensor_codes ┆ __qc__temperature__range │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ datetime[μs] ┆ i64 ┆ i64 ┆ i64 ┆ bool │
╞═════════════════════╪═════════════╪═══════════════╪══════════════╪══════════════════════════╡
│ 2023-01-01 00:00:00 ┆ 24 ┆ -3 ┆ 992 ┆ false │
│ 2023-01-01 01:00:00 ┆ 22 ┆ 0 ┆ 1 ┆ false │
│ 2023-01-01 02:00:00 ┆ -35 ┆ 5 ┆ 1 ┆ true │
│ 2023-01-01 03:00:00 ┆ 26 ┆ 10 ┆ 1 ┆ false │
│ 2023-01-01 04:00:00 ┆ 24 ┆ 2 ┆ 1 ┆ false │
│ 2023-01-01 05:00:00 ┆ 26 ┆ 0 ┆ 1 ┆ false │
│ 2023-01-01 06:00:00 ┆ 28 ┆ 0 ┆ 1 ┆ false │
│ 2023-01-01 07:00:00 ┆ 50 ┆ 3 ┆ 991 ┆ true │
│ 2023-01-01 08:00:00 ┆ 52 ┆ 1 ┆ 995 ┆ true │
│ 2023-01-01 09:00:00 ┆ 29 ┆ 0 ┆ 1 ┆ false │
└─────────────────────┴─────────────┴───────────────┴──────────────┴──────────────────────────┘
Flag rainfall values between the hours of 01:00 and 03:00
tf = tf.qc_check(
"time_range",
"precipitation",
min_value=time(1, 0),
max_value=time(3, 0),
into=True
)
shape: (10, 5)
┌─────────────────────┬─────────────┬───────────────┬──────────────┬───────────────────────────────┐
│ timestamp ┆ temperature ┆ precipitation ┆ sensor_codes ┆ __qc__precipitation__time_ran │
│ --- ┆ --- ┆ --- ┆ --- ┆ g… │
│ datetime[μs] ┆ i64 ┆ i64 ┆ i64 ┆ --- │
│ ┆ ┆ ┆ ┆ bool │
╞═════════════════════╪═════════════╪═══════════════╪══════════════╪═══════════════════════════════╡
│ 2023-01-01 00:00:00 ┆ 24 ┆ -3 ┆ 992 ┆ false │
│ 2023-01-01 01:00:00 ┆ 22 ┆ 0 ┆ 1 ┆ true │
│ 2023-01-01 02:00:00 ┆ -35 ┆ 5 ┆ 1 ┆ true │
│ 2023-01-01 03:00:00 ┆ 26 ┆ 10 ┆ 1 ┆ true │
│ 2023-01-01 04:00:00 ┆ 24 ┆ 2 ┆ 1 ┆ false │
│ 2023-01-01 05:00:00 ┆ 26 ┆ 0 ┆ 1 ┆ false │
│ 2023-01-01 06:00:00 ┆ 28 ┆ 0 ┆ 1 ┆ false │
│ 2023-01-01 07:00:00 ┆ 50 ┆ 3 ┆ 991 ┆ false │
│ 2023-01-01 08:00:00 ┆ 52 ┆ 1 ┆ 995 ┆ false │
│ 2023-01-01 09:00:00 ┆ 29 ┆ 0 ┆ 1 ┆ false │
└─────────────────────┴─────────────┴───────────────┴──────────────┴───────────────────────────────┘
Flag temperature values between 03:30 on the 1st January and 09:30 on the 1st January
tf = tf.qc_check(
"time_range",
"temperature",
min_value=datetime(2023, 1, 1, 3, 30),
max_value=datetime(2023, 1, 1, 9, 30),
into=True,
)
shape: (10, 5)
┌─────────────────────┬─────────────┬───────────────┬──────────────┬───────────────────────────────┐
│ timestamp ┆ temperature ┆ precipitation ┆ sensor_codes ┆ __qc__temperature__time_range │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ datetime[μs] ┆ i64 ┆ i64 ┆ i64 ┆ bool │
╞═════════════════════╪═════════════╪═══════════════╪══════════════╪═══════════════════════════════╡
│ 2023-01-01 00:00:00 ┆ 24 ┆ -3 ┆ 992 ┆ false │
│ 2023-01-01 01:00:00 ┆ 22 ┆ 0 ┆ 1 ┆ false │
│ 2023-01-01 02:00:00 ┆ -35 ┆ 5 ┆ 1 ┆ false │
│ 2023-01-01 03:00:00 ┆ 26 ┆ 10 ┆ 1 ┆ false │
│ 2023-01-01 04:00:00 ┆ 24 ┆ 2 ┆ 1 ┆ true │
│ 2023-01-01 05:00:00 ┆ 26 ┆ 0 ┆ 1 ┆ true │
│ 2023-01-01 06:00:00 ┆ 28 ┆ 0 ┆ 1 ┆ true │
│ 2023-01-01 07:00:00 ┆ 50 ┆ 3 ┆ 991 ┆ true │
│ 2023-01-01 08:00:00 ┆ 52 ┆ 1 ┆ 995 ┆ true │
│ 2023-01-01 09:00:00 ┆ 29 ┆ 0 ┆ 1 ┆ true │
└─────────────────────┴─────────────┴───────────────┴──────────────┴───────────────────────────────┘
Spike check on temperature data
tf = tf.qc_check(
"spike", "temperature", threshold=10.0, into=True
)
shape: (10, 5)
┌─────────────────────┬─────────────┬───────────────┬──────────────┬──────────────────────────┐
│ timestamp ┆ temperature ┆ precipitation ┆ sensor_codes ┆ __qc__temperature__spike │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ datetime[μs] ┆ i64 ┆ i64 ┆ i64 ┆ bool │
╞═════════════════════╪═════════════╪═══════════════╪══════════════╪══════════════════════════╡
│ 2023-01-01 00:00:00 ┆ 24 ┆ -3 ┆ 992 ┆ null │
│ 2023-01-01 01:00:00 ┆ 22 ┆ 0 ┆ 1 ┆ false │
│ 2023-01-01 02:00:00 ┆ -35 ┆ 5 ┆ 1 ┆ true │
│ 2023-01-01 03:00:00 ┆ 26 ┆ 10 ┆ 1 ┆ false │
│ 2023-01-01 04:00:00 ┆ 24 ┆ 2 ┆ 1 ┆ false │
│ 2023-01-01 05:00:00 ┆ 26 ┆ 0 ┆ 1 ┆ false │
│ 2023-01-01 06:00:00 ┆ 28 ┆ 0 ┆ 1 ┆ false │
│ 2023-01-01 07:00:00 ┆ 50 ┆ 3 ┆ 991 ┆ false │
│ 2023-01-01 08:00:00 ┆ 52 ┆ 1 ┆ 995 ┆ false │
│ 2023-01-01 09:00:00 ┆ 29 ┆ 0 ┆ 1 ┆ null │
└─────────────────────┴─────────────┴───────────────┴──────────────┴──────────────────────────┘
Note
The result doesn’t flag the neighbouring high values of 50, 52. The spike test is really for detecting a sudden jump with one value between “normal” values.
Note
The result return null
for the first and last values; the spike test relies of comparisons of neighbouring
values.
Observation interval¶
Specify an observation interval to restrict the QC check to a specific time window. This is useful when:
You only want to QC a specific period of observations (e.g. summer 2024).
You need to re-run checks on recent data without reprocessing the full archive.
You want to exclude known bad periods (e.g. sensor maintenance) from checks.
Into¶
The into
argument controls what you get back:
into=False
→ return a boolean Series (mask of failed rows).into=True
→ add a new boolean column with an automatic name.into="my_column"
→ add a new boolean column with a custom name.
Note
If a column name already exists, Time-Stream auto-suffixes it to avoid overwriting.