Quality Control¶
Build QC your way - flexible checks with a consistent framework.
Why use Time-Stream?¶
Quality control is essential for any environmental dataset, but QC rules vary between projects, organisations, and sensor types. Time-Stream doesn’t make those decisions for you - instead, it provides a framework for applying common types of QC.
One-liner¶
QC checks are lightweight, configurable, and explicit:
tf_flagged = tf.qc_check(
"comparison", "rainfall", compare_to=0, operator="<",
flag_params=("rainfall_flag", "FLAGGED"),
)
A single call with rich meaning: “I want to QC check when my rainfall data is less than a value of 0, and record the result in the rainfall_flag flag column.”
Key benefits¶
You stay in control Flexibility to choose your thresholds, operators, and ranges.
Reproducible QC The same logic can be applied across datasets.
Traceable results Checks can add explicit boolean columns or flag values for later analysis.
Flexible Combine multiple checks, apply them in sequence, or restrict them to intervals.
In more detail¶
The qc_check() method applies a single QC check to one column.
Each QC check is configurable through parameters specific to that check.
Let’s look at the method in more detail:
- TimeFrame.qc_check(check, column_name, observation_interval=None, flag_params=None, **kwargs)[source]
Apply a quality control check to the TimeFrame.
- Parameters:
check (
Union[str,Type[QCCheck],QCCheck]) – The QC check to apply.column_name (
str) – The column to perform the check on.observation_interval (
tuple[datetime,datetime|None] |None) – Optional time interval to limit the check to.flag_params (
tuple[str,str|int] |None) – Tuple of (flag column name [str], flag value [str | int]. If provided, add given flag value to the flag column where the QC check returnsTrue. If not provided, the result of the QC check is returned as a boolean series.**kwargs – Parameters specific to the check type.
- Return type:
TimeFrame|Series- Returns:
Result of the QC check, either as a boolean Series or added to the TimeFrame dataframe
Quality control methods¶
comparison¶
time_stream.qc.ComparisonCheck
What it does: Compares values against a constant or list using a comparison operator (
<,<=,>,>=,==,!=,is_in).When to use: Use for value thresholds (e.g. negative rainfall) or matching against lists of known error codes.
- Additional args:
compare_to: The value (or list of values foris_in) to compare against.operator: The comparison operator string.flag_na: IfTrue, also flag NaN/null values as failing the check (default:False).Example usage:
Temperature greater than or equal to 50:
tf = tf.qc_check( "comparison", "temperature", compare_to=50, operator=">=", flag_params=("flag_column", "FLAGGED") )shape: (10, 5) ┌─────────────────────┬─────────────┬───────────────┬──────────────┬─────────────┐ │ timestamp ┆ temperature ┆ precipitation ┆ sensor_codes ┆ flag_column │ │ --- ┆ --- ┆ --- ┆ --- ┆ --- │ │ datetime[μs] ┆ i64 ┆ i64 ┆ i64 ┆ i64 │ ╞═════════════════════╪═════════════╪═══════════════╪══════════════╪═════════════╡ │ 2023-01-01 00:00:00 ┆ 24 ┆ -3 ┆ 992 ┆ 0 │ │ 2023-01-01 01:00:00 ┆ 22 ┆ 0 ┆ 1 ┆ 0 │ │ 2023-01-01 02:00:00 ┆ -35 ┆ 5 ┆ 1 ┆ 0 │ │ 2023-01-01 03:00:00 ┆ 26 ┆ 10 ┆ 1 ┆ 0 │ │ 2023-01-01 04:00:00 ┆ 24 ┆ 2 ┆ 1 ┆ 0 │ │ 2023-01-01 05:00:00 ┆ 26 ┆ 0 ┆ 1 ┆ 0 │ │ 2023-01-01 06:00:00 ┆ 28 ┆ 0 ┆ 1 ┆ 0 │ │ 2023-01-01 07:00:00 ┆ 50 ┆ 3 ┆ 991 ┆ 1 │ │ 2023-01-01 08:00:00 ┆ 52 ┆ 1 ┆ 995 ┆ 1 │ │ 2023-01-01 09:00:00 ┆ 29 ┆ 0 ┆ 1 ┆ 0 │ └─────────────────────┴─────────────┴───────────────┴──────────────┴─────────────┘Sensor codes within a list:
error_codes = [991, 992, 993, 994, 995] tf = tf.qc_check( "comparison", "sensor_codes", compare_to=error_codes, operator="is_in", flag_params=("flag_column", "FLAGGED") )shape: (10, 5) ┌─────────────────────┬─────────────┬───────────────┬──────────────┬─────────────┐ │ timestamp ┆ temperature ┆ precipitation ┆ sensor_codes ┆ flag_column │ │ --- ┆ --- ┆ --- ┆ --- ┆ --- │ │ datetime[μs] ┆ i64 ┆ i64 ┆ i64 ┆ i64 │ ╞═════════════════════╪═════════════╪═══════════════╪══════════════╪═════════════╡ │ 2023-01-01 00:00:00 ┆ 24 ┆ -3 ┆ 992 ┆ 1 │ │ 2023-01-01 01:00:00 ┆ 22 ┆ 0 ┆ 1 ┆ 0 │ │ 2023-01-01 02:00:00 ┆ -35 ┆ 5 ┆ 1 ┆ 0 │ │ 2023-01-01 03:00:00 ┆ 26 ┆ 10 ┆ 1 ┆ 0 │ │ 2023-01-01 04:00:00 ┆ 24 ┆ 2 ┆ 1 ┆ 0 │ │ 2023-01-01 05:00:00 ┆ 26 ┆ 0 ┆ 1 ┆ 0 │ │ 2023-01-01 06:00:00 ┆ 28 ┆ 0 ┆ 1 ┆ 0 │ │ 2023-01-01 07:00:00 ┆ 50 ┆ 3 ┆ 991 ┆ 1 │ │ 2023-01-01 08:00:00 ┆ 52 ┆ 1 ┆ 995 ┆ 1 │ │ 2023-01-01 09:00:00 ┆ 29 ┆ 0 ┆ 1 ┆ 0 │ └─────────────────────┴─────────────┴───────────────┴──────────────┴─────────────┘
range¶
What it does: Checks whether values fall inside or outside a min-max interval.
When to use: Use for physical plausibility bounds, such as temperature between -30 and 50°C
- Additional args:
min_value: Minimum of the range.max_value: Maximum of the range.closed: Which sides of the interval are inclusive -"both","left","right", or"none"(default:"both").within: Whether to flag values within the range (True, default) or outside it (False).Example usage:
Temperatures outside of the range -30 to 50:
tf = tf.qc_check( "range", "temperature", min_value=-30, max_value=50, closed="none", # Range is not inclusive of min and max value within=False, # Flag values outside of this range flag_params=("flag_column", "FLAGGED"), )shape: (10, 5) ┌─────────────────────┬─────────────┬───────────────┬──────────────┬─────────────┐ │ timestamp ┆ temperature ┆ precipitation ┆ sensor_codes ┆ flag_column │ │ --- ┆ --- ┆ --- ┆ --- ┆ --- │ │ datetime[μs] ┆ i64 ┆ i64 ┆ i64 ┆ i64 │ ╞═════════════════════╪═════════════╪═══════════════╪══════════════╪═════════════╡ │ 2023-01-01 00:00:00 ┆ 24 ┆ -3 ┆ 992 ┆ 0 │ │ 2023-01-01 01:00:00 ┆ 22 ┆ 0 ┆ 1 ┆ 0 │ │ 2023-01-01 02:00:00 ┆ -35 ┆ 5 ┆ 1 ┆ 1 │ │ 2023-01-01 03:00:00 ┆ 26 ┆ 10 ┆ 1 ┆ 0 │ │ 2023-01-01 04:00:00 ┆ 24 ┆ 2 ┆ 1 ┆ 0 │ │ 2023-01-01 05:00:00 ┆ 26 ┆ 0 ┆ 1 ┆ 0 │ │ 2023-01-01 06:00:00 ┆ 28 ┆ 0 ┆ 1 ┆ 0 │ │ 2023-01-01 07:00:00 ┆ 50 ┆ 3 ┆ 991 ┆ 1 │ │ 2023-01-01 08:00:00 ┆ 52 ┆ 1 ┆ 995 ┆ 1 │ │ 2023-01-01 09:00:00 ┆ 29 ┆ 0 ┆ 1 ┆ 0 │ └─────────────────────┴─────────────┴───────────────┴──────────────┴─────────────┘
time_range¶
What it does: Flags rows where the primary time column falls within a given range. Accepts
datetime.time,datetime.date, ordatetime.datetimebounds.When to use: Use for known bad periods such as sensor outages or automated calibration times.
- Additional args:
min_value: Start of the time range.max_value: End of the time range.closed: Which sides of the interval are inclusive -"both","left","right", or"none"(default:"both").within: Whether to flag values within the range (True, default) or outside it (False).Example usage:
Flag rainfall values between the hours of 01:00 and 03:00:
tf = tf.qc_check( "time_range", "precipitation", min_value=time(1, 0), max_value=time(3, 0), flag_params=("flag_column", "FLAGGED") )shape: (10, 5) ┌─────────────────────┬─────────────┬───────────────┬──────────────┬─────────────┐ │ timestamp ┆ temperature ┆ precipitation ┆ sensor_codes ┆ flag_column │ │ --- ┆ --- ┆ --- ┆ --- ┆ --- │ │ datetime[μs] ┆ i64 ┆ i64 ┆ i64 ┆ i64 │ ╞═════════════════════╪═════════════╪═══════════════╪══════════════╪═════════════╡ │ 2023-01-01 00:00:00 ┆ 24 ┆ -3 ┆ 992 ┆ 0 │ │ 2023-01-01 01:00:00 ┆ 22 ┆ 0 ┆ 1 ┆ 1 │ │ 2023-01-01 02:00:00 ┆ -35 ┆ 5 ┆ 1 ┆ 1 │ │ 2023-01-01 03:00:00 ┆ 26 ┆ 10 ┆ 1 ┆ 1 │ │ 2023-01-01 04:00:00 ┆ 24 ┆ 2 ┆ 1 ┆ 0 │ │ 2023-01-01 05:00:00 ┆ 26 ┆ 0 ┆ 1 ┆ 0 │ │ 2023-01-01 06:00:00 ┆ 28 ┆ 0 ┆ 1 ┆ 0 │ │ 2023-01-01 07:00:00 ┆ 50 ┆ 3 ┆ 991 ┆ 0 │ │ 2023-01-01 08:00:00 ┆ 52 ┆ 1 ┆ 995 ┆ 0 │ │ 2023-01-01 09:00:00 ┆ 29 ┆ 0 ┆ 1 ┆ 0 │ └─────────────────────┴─────────────┴───────────────┴──────────────┴─────────────┘Flag temperature values between 03:30 and 09:30 on the 1st January:
tf = tf.qc_check( "time_range", "temperature", min_value=datetime(2023, 1, 1, 3, 30), max_value=datetime(2023, 1, 1, 9, 30), flag_params=("flag_column", "FLAGGED"), )shape: (10, 5) ┌─────────────────────┬─────────────┬───────────────┬──────────────┬─────────────┐ │ timestamp ┆ temperature ┆ precipitation ┆ sensor_codes ┆ flag_column │ │ --- ┆ --- ┆ --- ┆ --- ┆ --- │ │ datetime[μs] ┆ i64 ┆ i64 ┆ i64 ┆ i64 │ ╞═════════════════════╪═════════════╪═══════════════╪══════════════╪═════════════╡ │ 2023-01-01 00:00:00 ┆ 24 ┆ -3 ┆ 992 ┆ 0 │ │ 2023-01-01 01:00:00 ┆ 22 ┆ 0 ┆ 1 ┆ 0 │ │ 2023-01-01 02:00:00 ┆ -35 ┆ 5 ┆ 1 ┆ 0 │ │ 2023-01-01 03:00:00 ┆ 26 ┆ 10 ┆ 1 ┆ 0 │ │ 2023-01-01 04:00:00 ┆ 24 ┆ 2 ┆ 1 ┆ 1 │ │ 2023-01-01 05:00:00 ┆ 26 ┆ 0 ┆ 1 ┆ 1 │ │ 2023-01-01 06:00:00 ┆ 28 ┆ 0 ┆ 1 ┆ 1 │ │ 2023-01-01 07:00:00 ┆ 50 ┆ 3 ┆ 991 ┆ 1 │ │ 2023-01-01 08:00:00 ┆ 52 ┆ 1 ┆ 995 ┆ 1 │ │ 2023-01-01 09:00:00 ┆ 29 ┆ 0 ┆ 1 ┆ 1 │ └─────────────────────┴─────────────┴───────────────┴──────────────┴─────────────┘
spike¶
What it does: Detects sudden jumps by assessing differences with neighbouring values. A point is flagged when the combined neighbour difference (minus skew) exceeds twice the threshold.
When to use: Use for detecting unrealistic single-point spikes - isolated values that jump sharply compared to their neighbours.
- Additional args:
threshold: The spike detection threshold.Example usage:
Spike check on temperature data:
tf = tf.qc_check( "spike", "temperature", threshold=10.0, flag_params=("flag_column", "FLAGGED") )shape: (10, 5) ┌─────────────────────┬─────────────┬───────────────┬──────────────┬─────────────┐ │ timestamp ┆ temperature ┆ precipitation ┆ sensor_codes ┆ flag_column │ │ --- ┆ --- ┆ --- ┆ --- ┆ --- │ │ datetime[μs] ┆ i64 ┆ i64 ┆ i64 ┆ i64 │ ╞═════════════════════╪═════════════╪═══════════════╪══════════════╪═════════════╡ │ 2023-01-01 00:00:00 ┆ 24 ┆ -3 ┆ 992 ┆ 0 │ │ 2023-01-01 01:00:00 ┆ 22 ┆ 0 ┆ 1 ┆ 0 │ │ 2023-01-01 02:00:00 ┆ -35 ┆ 5 ┆ 1 ┆ 1 │ │ 2023-01-01 03:00:00 ┆ 26 ┆ 10 ┆ 1 ┆ 0 │ │ 2023-01-01 04:00:00 ┆ 24 ┆ 2 ┆ 1 ┆ 0 │ │ 2023-01-01 05:00:00 ┆ 26 ┆ 0 ┆ 1 ┆ 0 │ │ 2023-01-01 06:00:00 ┆ 28 ┆ 0 ┆ 1 ┆ 0 │ │ 2023-01-01 07:00:00 ┆ 50 ┆ 3 ┆ 991 ┆ 0 │ │ 2023-01-01 08:00:00 ┆ 52 ┆ 1 ┆ 995 ┆ 0 │ │ 2023-01-01 09:00:00 ┆ 29 ┆ 0 ┆ 1 ┆ 0 │ └─────────────────────┴─────────────┴───────────────┴──────────────┴─────────────┘Note
The result doesn’t flag the neighbouring high values of 50 and 52. The spike test detects a sudden jump where one value sits between otherwise normal values.
Note
The result returns
nullfor the first and last values; the spike test relies on comparisons with neighbouring values.
flat_line¶
What it does: Detects consecutive repeated (or near-repeated) values in a column.
When to use: Use when a sensor stuck at a fixed value should be flagged as suspect.
- Additional args:
min_count: Minimum number of consecutive repeated values required for a flat line (must be at least 2).tolerance: Optional tolerance for near-equality comparison. When set, consecutive values differing by less than or equal to this amount are considered equal (default:None, exact equality).ignore_value: Optional value or list of values that are allowed to repeat without being flagged.Example usage:
Flag temperature values stuck at the same reading for 3 or more consecutive timesteps:
tf = tf.qc_check( "flat_line", "temperature", min_count=3, flag_params=("flag_column", "FLAGGED") )shape: (10, 3) ┌─────────────────────┬─────────────┬─────────────┐ │ timestamp ┆ temperature ┆ flag_column │ │ --- ┆ --- ┆ --- │ │ datetime[μs] ┆ f64 ┆ i64 │ ╞═════════════════════╪═════════════╪═════════════╡ │ 2023-01-01 00:00:00 ┆ 18.0 ┆ 0 │ │ 2023-01-01 01:00:00 ┆ 20.0 ┆ 1 │ │ 2023-01-01 02:00:00 ┆ 20.0 ┆ 1 │ │ 2023-01-01 03:00:00 ┆ 20.0 ┆ 1 │ │ 2023-01-01 04:00:00 ┆ 20.0 ┆ 1 │ │ 2023-01-01 05:00:00 ┆ 22.0 ┆ 0 │ │ 2023-01-01 06:00:00 ┆ 21.0 ┆ 0 │ │ 2023-01-01 07:00:00 ┆ 0.0 ┆ 1 │ │ 2023-01-01 08:00:00 ┆ 0.0 ┆ 1 │ │ 2023-01-01 09:00:00 ┆ 0.0 ┆ 1 │ └─────────────────────┴─────────────┴─────────────┘Using
ignore_value- suppress flagging when the repeated value is 0.0:tf = tf.qc_check( "flat_line", "temperature", min_count=3, ignore_value=0.0, flag_params=("flag_column", "FLAGGED") )shape: (10, 3) ┌─────────────────────┬─────────────┬─────────────┐ │ timestamp ┆ temperature ┆ flag_column │ │ --- ┆ --- ┆ --- │ │ datetime[μs] ┆ f64 ┆ i64 │ ╞═════════════════════╪═════════════╪═════════════╡ │ 2023-01-01 00:00:00 ┆ 18.0 ┆ 0 │ │ 2023-01-01 01:00:00 ┆ 20.0 ┆ 1 │ │ 2023-01-01 02:00:00 ┆ 20.0 ┆ 1 │ │ 2023-01-01 03:00:00 ┆ 20.0 ┆ 1 │ │ 2023-01-01 04:00:00 ┆ 20.0 ┆ 1 │ │ 2023-01-01 05:00:00 ┆ 22.0 ┆ 0 │ │ 2023-01-01 06:00:00 ┆ 21.0 ┆ 0 │ │ 2023-01-01 07:00:00 ┆ 0.0 ┆ 0 │ │ 2023-01-01 08:00:00 ┆ 0.0 ┆ 0 │ │ 2023-01-01 09:00:00 ┆ 0.0 ┆ 0 │ └─────────────────────┴─────────────┴─────────────┘Note
More than one
ignore_valuecan be specified in a list, e.g. [0.0, 20.0]Using
tolerance- flag values that barely change (within 0.01) for 3 or more consecutive readings:The data below drifts slightly around 20 °C (varying by less than 0.01 between readings) before jumping to a different range. The
toleranceparameter catches these near-flat runs that exact equality would miss.tf = tf.qc_check( "flat_line", "temperature", min_count=3, tolerance=0.1, flag_params=("flag_column", "FLAGGED") )shape: (10, 3) ┌─────────────────────┬─────────────┬─────────────┐ │ timestamp ┆ temperature ┆ flag_column │ │ --- ┆ --- ┆ --- │ │ datetime[μs] ┆ f64 ┆ i64 │ ╞═════════════════════╪═════════════╪═════════════╡ │ 2023-01-01 00:00:00 ┆ 18.0 ┆ 0 │ │ 2023-01-01 01:00:00 ┆ 20.0 ┆ 1 │ │ 2023-01-01 02:00:00 ┆ 20.005 ┆ 1 │ │ 2023-01-01 03:00:00 ┆ 20.001 ┆ 1 │ │ 2023-01-01 04:00:00 ┆ 19.991 ┆ 1 │ │ 2023-01-01 05:00:00 ┆ 22.0 ┆ 0 │ │ 2023-01-01 06:00:00 ┆ 20.99 ┆ 1 │ │ 2023-01-01 07:00:00 ┆ 21.003 ┆ 1 │ │ 2023-01-01 08:00:00 ┆ 21.009 ┆ 1 │ │ 2023-01-01 09:00:00 ┆ 20.997 ┆ 1 │ └─────────────────────┴─────────────┴─────────────┘
Observation interval¶
Specify an observation interval to restrict the QC check to a specific time window. This is useful when:
You only want to QC a specific period of observations (e.g. summer 2024).
You need to re-run checks on recent data without reprocessing the full archive.
You want to exclude known bad periods (e.g. sensor maintenance) from checks.
Flag Parameters¶
The result of a QC check can be consumed in one of two ways, selected via the flag_params argument:
Boolean series (default,
flag_paramsomitted) -qc_checkreturns a Polars booleanSeriesof the same length as the TimeFrame, withTruemarking the rows that failed the check. Useful for chaining into custom expressions or feeding intoadd_flag()manually.Flag column update (
flag_params=(flag_column_name, flag_value)) -qc_checkadds the given flag value to the named flag column on each failing row and returns a new TimeFrame. The flag column must already exist (see Flagging).
The examples below use the flag-column style. Each sets up a bitwise flag system with a single
FLAGGED flag and calls init_flag_column() before running the
check:
tf.register_flag_system("qc", {"FLAGGED": 1})
tf.init_flag_column("qc", "flag_column")