Quality control checks
The quality control (QC) system in the Time Stream library provides a flexible framework for flagging
potential issues in time series data. It allows users to define and apply QC checks to individual
columns of a TimeSeries
object, the results of which can then be used to update flag columns or lead to
further inspection or filtering of your data.
Applying a QC Check
To apply a QC check, call the TimeSeries.qc_check
method on a TimeSeries
object. This method allows you to:
Specify the check type (see below for available built-in quality control checks)
Choose the column to evaluate
Optionally limit the QC check to a time observation window
Built-in Quality Control Checks
Several built-in QC checks are available. Each check encapsulates a validation rule and supports configuration through parameters specific to that check.
The examples given below all use this TimeSeries
object:
dates = [datetime(2023, 1, 1) + timedelta(hours=i) for i in range(10)]
temperatures = [24, 22, -35, 26, 24, 26, 28, 50, 52, 29]
precipitation = [-3, 0, 5, 10, 2, 0, 0, 3, 1, 0]
sensor_codes = [992, 1, 1, 1, 1, 1, 1, 991, 995, 1]
df = pl.DataFrame({
"timestamp": dates,
"temperature": temperatures,
"precipitation": precipitation,
"sensor_codes": sensor_codes
})
ts = TimeSeries(
df=df,
time_name="timestamp"
)
Comparison Check
Compares values in the TimeSeries
with a constant value using a specified operator.
Name: "comparison"
- class time_stream.qc.ComparisonCheck(compare_to, operator, flag_na=False)[source]
Compares values against a given value using a comparison operator.
Initialize comparison check.
The is_in
operator is a special case, where you must pass a list of values to the check against. The check then
flags results based on whether a value in the TimeSeries
is within this list.
Examples
1. Temperature greater than or equal to 50
ts.df = ts.df.with_columns(
ts.qc_check(
"comparison",
check_column="temperature",
compare_to=50,
operator=">="
).alias("qc_result")
)
print(ts)
shape: (10, 5)
┌─────────────────────┬─────────────┬───────────────┬──────────────┬───────────┐
│ timestamp ┆ temperature ┆ precipitation ┆ sensor_codes ┆ qc_result │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ datetime[μs] ┆ i64 ┆ i64 ┆ i64 ┆ bool │
╞═════════════════════╪═════════════╪═══════════════╪══════════════╪═══════════╡
│ 2023-01-01 00:00:00 ┆ 24 ┆ -3 ┆ 992 ┆ false │
│ 2023-01-01 01:00:00 ┆ 22 ┆ 0 ┆ 1 ┆ false │
│ 2023-01-01 02:00:00 ┆ -35 ┆ 5 ┆ 1 ┆ false │
│ 2023-01-01 03:00:00 ┆ 26 ┆ 10 ┆ 1 ┆ false │
│ 2023-01-01 04:00:00 ┆ 24 ┆ 2 ┆ 1 ┆ false │
│ 2023-01-01 05:00:00 ┆ 26 ┆ 0 ┆ 1 ┆ false │
│ 2023-01-01 06:00:00 ┆ 28 ┆ 0 ┆ 1 ┆ false │
│ 2023-01-01 07:00:00 ┆ 50 ┆ 3 ┆ 991 ┆ true │
│ 2023-01-01 08:00:00 ┆ 52 ┆ 1 ┆ 995 ┆ true │
│ 2023-01-01 09:00:00 ┆ 29 ┆ 0 ┆ 1 ┆ false │
└─────────────────────┴─────────────┴───────────────┴──────────────┴───────────┘
2. Precipitation less than 0
ts.df = ts.df.with_columns(
ts.qc_check(
"comparison",
check_column="precipitation",
compare_to=0,
operator="<"
).alias("qc_result")
)
print(ts)
shape: (10, 5)
┌─────────────────────┬─────────────┬───────────────┬──────────────┬───────────┐
│ timestamp ┆ temperature ┆ precipitation ┆ sensor_codes ┆ qc_result │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ datetime[μs] ┆ i64 ┆ i64 ┆ i64 ┆ bool │
╞═════════════════════╪═════════════╪═══════════════╪══════════════╪═══════════╡
│ 2023-01-01 00:00:00 ┆ 24 ┆ -3 ┆ 992 ┆ true │
│ 2023-01-01 01:00:00 ┆ 22 ┆ 0 ┆ 1 ┆ false │
│ 2023-01-01 02:00:00 ┆ -35 ┆ 5 ┆ 1 ┆ false │
│ 2023-01-01 03:00:00 ┆ 26 ┆ 10 ┆ 1 ┆ false │
│ 2023-01-01 04:00:00 ┆ 24 ┆ 2 ┆ 1 ┆ false │
│ 2023-01-01 05:00:00 ┆ 26 ┆ 0 ┆ 1 ┆ false │
│ 2023-01-01 06:00:00 ┆ 28 ┆ 0 ┆ 1 ┆ false │
│ 2023-01-01 07:00:00 ┆ 50 ┆ 3 ┆ 991 ┆ false │
│ 2023-01-01 08:00:00 ┆ 52 ┆ 1 ┆ 995 ┆ false │
│ 2023-01-01 09:00:00 ┆ 29 ┆ 0 ┆ 1 ┆ false │
└─────────────────────┴─────────────┴───────────────┴──────────────┴───────────┘
3. Sensor codes within a list
ts.df = ts.df.with_columns(
ts.qc_check(
"comparison",
check_column="sensor_codes",
compare_to=[991, 992, 993, 994, 995],
operator="is_in"
).alias("qc_result")
)
print(ts)
shape: (10, 5)
┌─────────────────────┬─────────────┬───────────────┬──────────────┬───────────┐
│ timestamp ┆ temperature ┆ precipitation ┆ sensor_codes ┆ qc_result │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ datetime[μs] ┆ i64 ┆ i64 ┆ i64 ┆ bool │
╞═════════════════════╪═════════════╪═══════════════╪══════════════╪═══════════╡
│ 2023-01-01 00:00:00 ┆ 24 ┆ -3 ┆ 992 ┆ true │
│ 2023-01-01 01:00:00 ┆ 22 ┆ 0 ┆ 1 ┆ false │
│ 2023-01-01 02:00:00 ┆ -35 ┆ 5 ┆ 1 ┆ false │
│ 2023-01-01 03:00:00 ┆ 26 ┆ 10 ┆ 1 ┆ false │
│ 2023-01-01 04:00:00 ┆ 24 ┆ 2 ┆ 1 ┆ false │
│ 2023-01-01 05:00:00 ┆ 26 ┆ 0 ┆ 1 ┆ false │
│ 2023-01-01 06:00:00 ┆ 28 ┆ 0 ┆ 1 ┆ false │
│ 2023-01-01 07:00:00 ┆ 50 ┆ 3 ┆ 991 ┆ true │
│ 2023-01-01 08:00:00 ┆ 52 ┆ 1 ┆ 995 ┆ true │
│ 2023-01-01 09:00:00 ┆ 29 ┆ 0 ┆ 1 ┆ false │
└─────────────────────┴─────────────┴───────────────┴──────────────┴───────────┘
Range Check
Flags values in the TimeSeries
outside or within a specified value range.
Name: "range"
- class time_stream.qc.RangeCheck(min_value, max_value, closed='both', within=True)[source]
Check that values fall within an acceptable range.
Initialize range check.
- Parameters:
min_value (float | time | date | datetime) – Minimum of the range.
max_value (float | time | date | datetime) – Maximum of the range.
closed (str | ClosedInterval | None) – Define which sides of the interval are closed (inclusive) {‘both’, ‘left’, ‘right’, ‘none’} (default = “both”)
within (bool | None) – Whether values get flagged when within or outside the range (default = True (within)).
Examples
1. Temperatures outside of min and max range (below -30 and above 50)
ts.df = ts.df.with_columns(
ts.qc_check(
"range",
check_column="temperature",
min_value=-10,
max_value=50,
closed="none", # Range is not inclusive of min and max value
within=False, # Flag values outside of this range
).alias("qc_result")
)
print(ts)
shape: (10, 5)
┌─────────────────────┬─────────────┬───────────────┬──────────────┬───────────┐
│ timestamp ┆ temperature ┆ precipitation ┆ sensor_codes ┆ qc_result │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ datetime[μs] ┆ i64 ┆ i64 ┆ i64 ┆ bool │
╞═════════════════════╪═════════════╪═══════════════╪══════════════╪═══════════╡
│ 2023-01-01 00:00:00 ┆ 24 ┆ -3 ┆ 992 ┆ false │
│ 2023-01-01 01:00:00 ┆ 22 ┆ 0 ┆ 1 ┆ false │
│ 2023-01-01 02:00:00 ┆ -35 ┆ 5 ┆ 1 ┆ true │
│ 2023-01-01 03:00:00 ┆ 26 ┆ 10 ┆ 1 ┆ false │
│ 2023-01-01 04:00:00 ┆ 24 ┆ 2 ┆ 1 ┆ false │
│ 2023-01-01 05:00:00 ┆ 26 ┆ 0 ┆ 1 ┆ false │
│ 2023-01-01 06:00:00 ┆ 28 ┆ 0 ┆ 1 ┆ false │
│ 2023-01-01 07:00:00 ┆ 50 ┆ 3 ┆ 991 ┆ true │
│ 2023-01-01 08:00:00 ┆ 52 ┆ 1 ┆ 995 ┆ true │
│ 2023-01-01 09:00:00 ┆ 29 ┆ 0 ┆ 1 ┆ false │
└─────────────────────┴─────────────┴───────────────┴──────────────┴───────────┘
2. Precipitation values between -3 and 1
ts.df = ts.df.with_columns(
ts.qc_check(
"range",
check_column="precipitation",
min_value=-3,
max_value=1,
closed="both", # Range is inclusive of min and max value
within=True, # Flag values inside of this range
).alias("qc_result")
)
print(ts)
shape: (10, 5)
┌─────────────────────┬─────────────┬───────────────┬──────────────┬───────────┐
│ timestamp ┆ temperature ┆ precipitation ┆ sensor_codes ┆ qc_result │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ datetime[μs] ┆ i64 ┆ i64 ┆ i64 ┆ bool │
╞═════════════════════╪═════════════╪═══════════════╪══════════════╪═══════════╡
│ 2023-01-01 00:00:00 ┆ 24 ┆ -3 ┆ 992 ┆ true │
│ 2023-01-01 01:00:00 ┆ 22 ┆ 0 ┆ 1 ┆ true │
│ 2023-01-01 02:00:00 ┆ -35 ┆ 5 ┆ 1 ┆ false │
│ 2023-01-01 03:00:00 ┆ 26 ┆ 10 ┆ 1 ┆ false │
│ 2023-01-01 04:00:00 ┆ 24 ┆ 2 ┆ 1 ┆ false │
│ 2023-01-01 05:00:00 ┆ 26 ┆ 0 ┆ 1 ┆ true │
│ 2023-01-01 06:00:00 ┆ 28 ┆ 0 ┆ 1 ┆ true │
│ 2023-01-01 07:00:00 ┆ 50 ┆ 3 ┆ 991 ┆ false │
│ 2023-01-01 08:00:00 ┆ 52 ┆ 1 ┆ 995 ┆ true │
│ 2023-01-01 09:00:00 ┆ 29 ┆ 0 ┆ 1 ┆ true │
└─────────────────────┴─────────────┴───────────────┴──────────────┴───────────┘
Time Range Check
Flags values in the TimeSeries
outside or within a specified time range in the TimeSeries
primary time column.
This can either be used with min / max values of:
datetime.time
: Useful for scenarios where there are consistent errors at a certain time of day, e.g., during an automated sensor calibration time.datetime.date
: Useful for scenarios where a specific date range is known to be bad, e.g., during a date range of known sensor malfunction.datetime.datetime
: As above, but where there you need to add a time to the date range as well.
Name: "time_range"
Note
This is equivalent to using RangeCheck
with check_column = ts.time_name
. This is a
convenience method to be explicit that we are working with the primary time column in the TimeSeries
object.
Examples
1. Flag values between the hours of 01:00 and 03:00
ts.df = ts.df.with_columns(
ts.qc_check(
"time_range",
check_column="temperature",
min_value=time(1, 0),
max_value=time(3, 0)
).alias("qc_result")
)
print(ts)
shape: (10, 5)
┌─────────────────────┬─────────────┬───────────────┬──────────────┬───────────┐
│ timestamp ┆ temperature ┆ precipitation ┆ sensor_codes ┆ qc_result │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ datetime[μs] ┆ i64 ┆ i64 ┆ i64 ┆ bool │
╞═════════════════════╪═════════════╪═══════════════╪══════════════╪═══════════╡
│ 2023-01-01 00:00:00 ┆ 24 ┆ -3 ┆ 992 ┆ false │
│ 2023-01-01 01:00:00 ┆ 22 ┆ 0 ┆ 1 ┆ true │
│ 2023-01-01 02:00:00 ┆ -35 ┆ 5 ┆ 1 ┆ true │
│ 2023-01-01 03:00:00 ┆ 26 ┆ 10 ┆ 1 ┆ true │
│ 2023-01-01 04:00:00 ┆ 24 ┆ 2 ┆ 1 ┆ false │
│ 2023-01-01 05:00:00 ┆ 26 ┆ 0 ┆ 1 ┆ false │
│ 2023-01-01 06:00:00 ┆ 28 ┆ 0 ┆ 1 ┆ false │
│ 2023-01-01 07:00:00 ┆ 50 ┆ 3 ┆ 991 ┆ false │
│ 2023-01-01 08:00:00 ┆ 52 ┆ 1 ┆ 995 ┆ false │
│ 2023-01-01 09:00:00 ┆ 29 ┆ 0 ┆ 1 ┆ false │
└─────────────────────┴─────────────┴───────────────┴──────────────┴───────────┘
2. Flag values between 03:30 on the 1st January and 09:30 on the 1st January
ts.df = ts.df.with_columns(
ts.qc_check(
"time_range",
check_column="temperature",
min_value=datetime(2023, 1, 1, 3, 30),
max_value=datetime(2023, 1, 1, 9, 30),
).alias("qc_result")
)
print(ts)
shape: (10, 5)
┌─────────────────────┬─────────────┬───────────────┬──────────────┬───────────┐
│ timestamp ┆ temperature ┆ precipitation ┆ sensor_codes ┆ qc_result │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ datetime[μs] ┆ i64 ┆ i64 ┆ i64 ┆ bool │
╞═════════════════════╪═════════════╪═══════════════╪══════════════╪═══════════╡
│ 2023-01-01 00:00:00 ┆ 24 ┆ -3 ┆ 992 ┆ false │
│ 2023-01-01 01:00:00 ┆ 22 ┆ 0 ┆ 1 ┆ false │
│ 2023-01-01 02:00:00 ┆ -35 ┆ 5 ┆ 1 ┆ false │
│ 2023-01-01 03:00:00 ┆ 26 ┆ 10 ┆ 1 ┆ false │
│ 2023-01-01 04:00:00 ┆ 24 ┆ 2 ┆ 1 ┆ true │
│ 2023-01-01 05:00:00 ┆ 26 ┆ 0 ┆ 1 ┆ true │
│ 2023-01-01 06:00:00 ┆ 28 ┆ 0 ┆ 1 ┆ true │
│ 2023-01-01 07:00:00 ┆ 50 ┆ 3 ┆ 991 ┆ true │
│ 2023-01-01 08:00:00 ┆ 52 ┆ 1 ┆ 995 ┆ true │
│ 2023-01-01 09:00:00 ┆ 29 ┆ 0 ┆ 1 ┆ true │
└─────────────────────┴─────────────┴───────────────┴──────────────┴───────────┘
3. Flag values between 1st January and the 2nd January
ts.df = ts.df.with_columns(
ts.qc_check(
"time_range",
check_column="temperature",
min_value=date(2023, 1, 1),
max_value=date(2023, 1, 2),
).alias("qc_result")
)
print(ts)
shape: (10, 5)
┌─────────────────────┬─────────────┬───────────────┬──────────────┬───────────┐
│ timestamp ┆ temperature ┆ precipitation ┆ sensor_codes ┆ qc_result │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ datetime[μs] ┆ i64 ┆ i64 ┆ i64 ┆ bool │
╞═════════════════════╪═════════════╪═══════════════╪══════════════╪═══════════╡
│ 2023-01-01 00:00:00 ┆ 24 ┆ -3 ┆ 992 ┆ true │
│ 2023-01-01 01:00:00 ┆ 22 ┆ 0 ┆ 1 ┆ true │
│ 2023-01-01 02:00:00 ┆ -35 ┆ 5 ┆ 1 ┆ true │
│ 2023-01-01 03:00:00 ┆ 26 ┆ 10 ┆ 1 ┆ true │
│ 2023-01-01 04:00:00 ┆ 24 ┆ 2 ┆ 1 ┆ true │
│ 2023-01-01 05:00:00 ┆ 26 ┆ 0 ┆ 1 ┆ true │
│ 2023-01-01 06:00:00 ┆ 28 ┆ 0 ┆ 1 ┆ true │
│ 2023-01-01 07:00:00 ┆ 50 ┆ 3 ┆ 991 ┆ true │
│ 2023-01-01 08:00:00 ┆ 52 ┆ 1 ┆ 995 ┆ true │
│ 2023-01-01 09:00:00 ┆ 29 ┆ 0 ┆ 1 ┆ true │
└─────────────────────┴─────────────┴───────────────┴──────────────┴───────────┘
Spike Check
Flags sudden jumps between values based on their differences with adjacent values (both previous and next).
Note
The first and last values in a time series cannot be assessed by the spike test as it requires neighbouring values. The result for the first and last items will be set to NULL.
Name: "spike"
- class time_stream.qc.SpikeCheck(threshold)[source]
Detect spikes by assessing differences with neighboring values.
Initialize spike detection check.
- Parameters:
threshold (float) – The spike detection threshold.
Examples
Spike check on temperature
Note
Note that the result doesn’t flag the neighbouring high values of 50, 52. The spike test is really for detecting a sudden jump with one value between “normal” values.
ts.df = ts.df.with_columns(
ts.qc_check(
"spike",
check_column="temperature",
threshold=10.0
).alias("qc_result")
)
print(ts)
shape: (10, 5)
┌─────────────────────┬─────────────┬───────────────┬──────────────┬───────────┐
│ timestamp ┆ temperature ┆ precipitation ┆ sensor_codes ┆ qc_result │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ datetime[μs] ┆ i64 ┆ i64 ┆ i64 ┆ bool │
╞═════════════════════╪═════════════╪═══════════════╪══════════════╪═══════════╡
│ 2023-01-01 00:00:00 ┆ 24 ┆ -3 ┆ 992 ┆ null │
│ 2023-01-01 01:00:00 ┆ 22 ┆ 0 ┆ 1 ┆ false │
│ 2023-01-01 02:00:00 ┆ -35 ┆ 5 ┆ 1 ┆ true │
│ 2023-01-01 03:00:00 ┆ 26 ┆ 10 ┆ 1 ┆ false │
│ 2023-01-01 04:00:00 ┆ 24 ┆ 2 ┆ 1 ┆ false │
│ 2023-01-01 05:00:00 ┆ 26 ┆ 0 ┆ 1 ┆ false │
│ 2023-01-01 06:00:00 ┆ 28 ┆ 0 ┆ 1 ┆ false │
│ 2023-01-01 07:00:00 ┆ 50 ┆ 3 ┆ 991 ┆ false │
│ 2023-01-01 08:00:00 ┆ 52 ┆ 1 ┆ 995 ┆ false │
│ 2023-01-01 09:00:00 ┆ 29 ┆ 0 ┆ 1 ┆ null │
└─────────────────────┴─────────────┴───────────────┴──────────────┴───────────┘
Applying QC checks during a specific time range
The observation_interval
argument can be used to constrain the QC check to a chunk of your time series.
# Only check data from specific dates
start_date = datetime(2023, 1, 5)
end_date = datetime(2023, 1, 10)
ts.df = ts.df.with_columns(
ts.qc_check(
"range",
check_column="temperature",
min_value=-10,
max_value=50,
closed="none",
within=False,
observation_interval=(start_date, end_date)
).alias("qc_result")
)
print(ts)
shape: (10, 5)
┌─────────────────────┬─────────────┬───────────────┬──────────────┬───────────┐
│ timestamp ┆ temperature ┆ precipitation ┆ sensor_codes ┆ qc_result │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ datetime[μs] ┆ i64 ┆ i64 ┆ i64 ┆ bool │
╞═════════════════════╪═════════════╪═══════════════╪══════════════╪═══════════╡
│ 2023-01-01 00:00:00 ┆ 24 ┆ -3 ┆ 992 ┆ false │
│ 2023-01-01 01:00:00 ┆ 22 ┆ 0 ┆ 1 ┆ false │
│ 2023-01-01 02:00:00 ┆ -35 ┆ 5 ┆ 1 ┆ false │
│ 2023-01-01 03:00:00 ┆ 26 ┆ 10 ┆ 1 ┆ false │
│ 2023-01-01 04:00:00 ┆ 24 ┆ 2 ┆ 1 ┆ false │
│ 2023-01-01 05:00:00 ┆ 26 ┆ 0 ┆ 1 ┆ false │
│ 2023-01-01 06:00:00 ┆ 28 ┆ 0 ┆ 1 ┆ false │
│ 2023-01-01 07:00:00 ┆ 50 ┆ 3 ┆ 991 ┆ false │
│ 2023-01-01 08:00:00 ┆ 52 ┆ 1 ┆ 995 ┆ false │
│ 2023-01-01 09:00:00 ┆ 29 ┆ 0 ┆ 1 ┆ false │
└─────────────────────┴─────────────┴───────────────┴──────────────┴───────────┘