Health score
The 0–100 rollup at the top of every Evaluate report — how it's computed, what gets deducted, and where the colour thresholds live.
The health score is a deliberately simple deduction model. It starts at 100 and subtracts a penalty for each check that looks off. The result is clamped to [0, 100]. Each active deduction shows up as a one-line note next to the score in the report.
metrics = Evaluate(session).compute()
metrics["health"]["score"] # 87.4
metrics["health"]["notes"] # ["Depth valid 72%", "1 RGB frame gaps"]The ruleset
Every penalty is max(0, target − actual) × weight, with two exceptions: RGB gaps subtract min(gap_count, rgb_gap_max_penalty) directly, and "missing stream" penalties are flat. Setting any *_weight to 0 disables that check entirely.
Hand checks use three buckets: ≥1 hand (at-least-one detection), exactly 2 hands, and >2 hands (metric only — typically a detection-error indicator, no deduction by default).
| Check | Default penalty | What triggers it |
|---|---|---|
| RGB frame gaps | min(gap_count, 10) | Inter-frame interval > 2 × median dt. |
| Depth valid % below target | (80 − actual) × 0.3 | valid_pct_mean below depth_valid_target. |
| Depth missing | −10 (flat) | No depth stream and depth_required=True (default). |
| Pose missing | −15 (flat) | No /camera/pose messages. |
| IMU missing | −5 (flat) | No /device/imu messages and imu_required=True (default). |
| Sync (RGB↔Depth) below target | (90 − pct) × 0.1 | within_50ms_pct below sync_target. |
| Sync (RGB↔Pose) below target | (90 − pct) × 0.1 | Same calculation, separate deduction. |
| Any-hand % below target | (50 − pct) × 0.15 | frames_with_any_hand_pct below hand_any_target. |
| ≥1-hand % below target | (30 − pct) × 0.05 | frames_with_1plus_hand_pct below hand_1plus_target. |
| Exactly-2-hand % below target | (20 − pct) × 0.10 | frames_with_2_hands_pct below hand_2_target. |
| Hand-pose buffer empty | 0 (off by default) | Set hand_missing_penalty > 0 to require session.add_hand_pose. |
All thresholds and weights live on EvaluateConfig. See the config reference for the full field list with defaults.
Making streams optional
Depth and IMU are marked mandatory by default: if the recording is missing the topic entirely, the score takes a flat hit (10 and 5 respectively). For headless QC or rigs without a depth sensor, flip them off:
EvaluateConfig(depth_required=False, imu_required=False)The corresponding _missing_penalty value is then ignored — the recording can be missing the stream without penalty. The colour-coded chips for valid-depth %, gravity, etc. still render normally when the stream is present.
Worked example
Imagine a 12-minute recording where:
- Depth valid % averaged 62 % (below the 80 target).
- RGB↔Pose sync had 84 % of frames within 50 ms (below the 90 target).
- ≥1 hand detected on 22 % of frames (below the 30 default target).
- Exactly 2 hands on 8 % of frames (below the 20 default target).
Defaults would deduct:
| Check | Math | Deduction |
|---|---|---|
| Depth valid | (80 − 62) × 0.3 | 5.4 |
| RGB↔Pose sync | (90 − 84) × 0.1 | 0.6 |
| Any-hand | (50 − 22) × 0.15 | 4.2 |
| ≥1-hand | (30 − 22) × 0.05 | 0.4 |
| Exactly-2-hand | (20 − 8) × 0.10 | 1.2 |
| Total | 100 − 11.8 | 88.2 |
Colour thresholds
Separate from the deduction targets, colour chips in the report are driven by (good_min, ok_min) pairs. A value ≥ good_min renders green, ≥ ok_min amber, otherwise red.
| Metric | Default (good, ok) | Config field |
|---|---|---|
| Overall health score | (80, 60) | health_thresholds |
| Depth valid % | (80, 50) | depth_valid_thresholds |
Sync within_50ms_pct (each pair) | (90, 70) | sync_thresholds |
| Any-hand % | (70, 30) | hand_any_thresholds |
| ≥1-hand % | (40, 15) | hand_1plus_thresholds |
| Exactly-2-hand % | (30, 10) | hand_2_thresholds |
| IMU |gravity| Δ from 9.81 | < 0.5 m/s² is green | imu_gravity_max_dev |
The status word next to the score (Good / Watch / Issues) tracks the overall health_thresholds.
Inspecting an active run's config
The report dumps every threshold and weight that produced the score in the Technical reference → Health score config block at the bottom of the page. From Python:
ev = Evaluate(session, config=my_config)
metrics = ev.compute()
metrics["config"] # the EvaluateConfig instance
metrics["health"]["score"]
metrics["health"]["notes"]Tuning the score for your domain
Defaults assume a hand-manipulation recording on the Stera capture rig. Common adjustments:
Single-handed task — Don't penalise low exactly-2-hand %:
cfg = EvaluateConfig(hand_2_weight=0.0)Stricter sync requirement — Bump the target to 95 % within 50 ms:
cfg = EvaluateConfig(sync_target=95.0, sync_weight=0.2)Rig without LiDAR depth — Mark depth optional so a missing stream doesn't tank the score:
cfg = EvaluateConfig(depth_required=False)See the config reference for every field.
Score and notes are derived from the same data the rest of the report shows. If a note doesn't match what you see in the section above, file a bug — they should never disagree.