EvaluateConfig
Every tunable threshold and penalty weight for the Evaluate health score, plus override recipes.
from stera import Evaluate, EvaluateConfig
cfg = EvaluateConfig(
sync_target=95.0,
hand_2_weight=0.0,
depth_required=False,
)
Evaluate(session, config=cfg).show()EvaluateConfig is a frozen-by-convention dataclass with three kinds of fields:
- Color thresholds —
(good_min, ok_min)pairs that drive the green / amber / red chips in the report. - Health-score deductions —
target+weightpairs that drive the 0–100 score. Set anyweightto0to disable that check. - Required-stream flags — booleans (
depth_required,imu_required) that gate the corresponding*_missing_penalty. Both default toTrue.
Hand checks have three buckets:
hand_1plus_*— frames with at least 1 hand (= any-hand).hand_2_*— frames with exactly 2 hands.frames_with_more_hands*— frames with strictly more than 2 hands. Metric only; no config knobs because it's a detection-error indicator rather than a quality signal.
Defaults at a glance
EvaluateConfig(
# color thresholds (good_min, ok_min)
health_thresholds = (80.0, 60.0),
depth_valid_thresholds = (80.0, 50.0),
sync_thresholds = (90.0, 70.0),
hand_any_thresholds = (70.0, 30.0),
hand_1plus_thresholds = (40.0, 15.0),
hand_2_thresholds = (30.0, 10.0),
imu_gravity_max_dev = 0.5, # m/s² Δ from 9.81 for green
# required-stream flags (True = penalise when missing)
depth_required = True,
imu_required = True,
# health-score deductions
rgb_gap_max_penalty = 10.0,
depth_valid_target = 80.0,
depth_valid_weight = 0.3,
depth_missing_penalty = 10.0,
pose_missing_penalty = 15.0,
imu_missing_penalty = 5.0,
sync_target = 90.0,
sync_weight = 0.1,
hand_any_target = 50.0,
hand_any_weight = 0.15,
hand_1plus_target = 30.0,
hand_1plus_weight = 0.05,
hand_2_target = 20.0,
hand_2_weight = 0.10,
hand_missing_penalty = 0.0,
)Color thresholds
A value ≥ good_min renders green, ≥ ok_min amber, otherwise red. Higher is always better.
| Field | Default | Drives |
|---|---|---|
health_thresholds | (80, 60) | The big score number in the Summary and the Good / Watch / Issues status word. |
depth_valid_thresholds | (80, 50) | "Valid mean" chip in the Depth section. |
sync_thresholds | (90, 70) | The three RGB↔X chips in the Sync section. |
hand_any_thresholds | (70, 30) | "Any hand" chip in the Hands section. |
hand_1plus_thresholds | (40, 15) | "≥1 hand" chip. |
hand_2_thresholds | (30, 10) | "2 hands" chip (exactly two). |
imu_gravity_max_dev (default 0.5 m/s²) is a single number, not a pair. The |gravity| chip in the IMU section is green when |9.81 − gravity_magnitude| ≤ imu_gravity_max_dev, amber otherwise.
Required-stream flags
Two booleans control whether a missing stream triggers its flat *_missing_penalty. Both default to True — the score will drop when these streams are absent. Flip to False to make the stream optional for the score (the corresponding sections in the report still render as "no data" either way).
| Field | Default | Effect when False |
|---|---|---|
depth_required | True | depth_missing_penalty is skipped even if the depth stream is absent. |
imu_required | True | imu_missing_penalty is skipped even if the IMU stream is absent. |
Pose is mandatory unconditionally (the SDK assumes ARKit pose is always present for an MCAP from the Stera App). If you need to make pose optional, set pose_missing_penalty=0.
Health-score deductions
| Field | Default | What it does |
|---|---|---|
rgb_gap_max_penalty | 10 | Caps the RGB-frame-gap deduction. Each detected gap subtracts 1 up to this cap. Set to 0 to disable. |
depth_valid_target | 80 | Below this, deducts (target − actual) × depth_valid_weight. |
depth_valid_weight | 0.3 | Scales the depth deduction. 0 disables. |
depth_missing_penalty | 10 | Flat deduction when there is no depth stream and depth_required=True. |
pose_missing_penalty | 15 | Flat deduction when /camera/pose is missing. |
imu_missing_penalty | 5 | Flat deduction when /device/imu is missing and imu_required=True. |
sync_target | 90 | within_50ms_pct threshold per sync pair. |
sync_weight | 0.1 | Per-pair deduction (target − pct) × weight. Applied separately to RGB↔Depth and RGB↔Pose. |
hand_any_target | 50 | Below this, deducts (target − any_pct) × hand_any_weight. |
hand_any_weight | 0.15 | |
hand_1plus_target | 30 | Same shape for frames with ≥1 hand. |
hand_1plus_weight | 0.05 | |
hand_2_target | 20 | Same shape for frames with exactly 2 hands. |
hand_2_weight | 0.10 | |
hand_missing_penalty | 0 | Flat deduction when session.add_hand_pose was never called. Off by default so headless QC of recordings without hand detection still scores 100. |
The full deduction logic lives in stera.eval.metrics.compute_health. See Health score for the formulae and a worked example.
Override recipes
Stricter sync
EvaluateConfig(sync_target=95.0, sync_weight=0.2)Raises the bar to 95 % of RGB frames within 50 ms of their nearest depth/pose sample, and doubles the deduction so it dominates the score.
Single-handed task
EvaluateConfig(hand_2_weight=0.0, hand_2_thresholds=(0.0, 0.0))Stops penalising low exactly-2-hand frame counts and removes the red colour on the "2 hands" chip. Useful for recordings where one hand is holding the camera.
Rig without depth / IMU
EvaluateConfig(depth_required=False, imu_required=False)Marks both streams optional. A recording from a non-LiDAR phone or an external camera no longer takes the flat 10 + 5 hit for missing those topics.
Headless QC: only fail on missing streams
EvaluateConfig(
depth_valid_weight=0.0,
sync_weight=0.0,
hand_any_weight=0.0,
hand_1plus_weight=0.0,
hand_2_weight=0.0,
rgb_gap_max_penalty=0.0,
)Disables every threshold-based check. Score only drops when a whole stream is missing (depth, pose, IMU), which is the bare minimum you'd want in a CI pipeline.
Require hand annotations
EvaluateConfig(hand_missing_penalty=20.0)Run this after a detection pipeline that should have called session.add_hand_pose. If the buffer is empty (the loop never wrote any), the score drops by 20.
Inspecting the active config
Evaluate.compute() puts the config it ran with on the metrics dict, and the report dumps the same in Technical reference → Health score config:
metrics = Evaluate(session, config=my_cfg).compute()
assert metrics["config"] is my_cfgFuture revisions may add new fields (e.g. tracking-state penalties). They'll land with safe defaults, so existing configs will continue to score the same way unless you opt in.