Health score

The 0–100 rollup at the top of every Evaluate report — how it's computed, what gets deducted, and where the colour thresholds live.

The health score is a deliberately simple deduction model. It starts at 100 and subtracts a penalty for each check that looks off. The result is clamped to [0, 100]. Each active deduction shows up as a one-line note next to the score in the report.

metrics = Evaluate(session).compute()
metrics["health"]["score"]   # 87.4
metrics["health"]["notes"]   # ["Depth valid 72%", "1 RGB frame gaps"]

The ruleset

Every penalty is max(0, target − actual) × weight, with two exceptions: RGB gaps subtract min(gap_count, rgb_gap_max_penalty) directly, and "missing stream" penalties are flat. Setting any *_weight to 0 disables that check entirely.

Hand checks use three buckets: ≥1 hand (at-least-one detection), exactly 2 hands, and >2 hands (metric only — typically a detection-error indicator, no deduction by default).

Check	Default penalty	What triggers it
RGB frame gaps	`min(gap_count, 10)`	Inter-frame interval > `2 × median dt`.
Depth valid % below target	`(80 − actual) × 0.3`	`valid_pct_mean` below `depth_valid_target`.
Depth missing	`−10` (flat)	No depth stream and `depth_required=True` (default).
Pose missing	`−15` (flat)	No `/camera/pose` messages.
IMU missing	`−5` (flat)	No `/device/imu` messages and `imu_required=True` (default).
Sync (RGB↔Depth) below target	`(90 − pct) × 0.1`	`within_50ms_pct` below `sync_target`.
Sync (RGB↔Pose) below target	`(90 − pct) × 0.1`	Same calculation, separate deduction.
Any-hand % below target	`(50 − pct) × 0.15`	`frames_with_any_hand_pct` below `hand_any_target`.
≥1-hand % below target	`(30 − pct) × 0.05`	`frames_with_1plus_hand_pct` below `hand_1plus_target`.
Exactly-2-hand % below target	`(20 − pct) × 0.10`	`frames_with_2_hands_pct` below `hand_2_target`.
Hand-pose buffer empty	`0` (off by default)	Set `hand_missing_penalty > 0` to require `session.add_hand_pose`.

All thresholds and weights live on EvaluateConfig. See the config reference for the full field list with defaults.

Making streams optional

Depth and IMU are marked mandatory by default: if the recording is missing the topic entirely, the score takes a flat hit (10 and 5 respectively). For headless QC or rigs without a depth sensor, flip them off:

EvaluateConfig(depth_required=False, imu_required=False)

The corresponding _missing_penalty value is then ignored — the recording can be missing the stream without penalty. The colour-coded chips for valid-depth %, gravity, etc. still render normally when the stream is present.

Worked example

Imagine a 12-minute recording where:

Depth valid % averaged 62 % (below the 80 target).
RGB↔Pose sync had 84 % of frames within 50 ms (below the 90 target).
≥1 hand detected on 22 % of frames (below the 30 default target).
Exactly 2 hands on 8 % of frames (below the 20 default target).

Defaults would deduct:

Check	Math	Deduction
Depth valid	`(80 − 62) × 0.3`	`5.4`
RGB↔Pose sync	`(90 − 84) × 0.1`	`0.6`
Any-hand	`(50 − 22) × 0.15`	`4.2`
≥1-hand	`(30 − 22) × 0.05`	`0.4`
Exactly-2-hand	`(20 − 8) × 0.10`	`1.2`
Total	`100 − 11.8`	`88.2`

Colour thresholds

Separate from the deduction targets, colour chips in the report are driven by (good_min, ok_min) pairs. A value ≥ good_min renders green, ≥ ok_min amber, otherwise red.

Metric	Default `(good, ok)`	Config field
Overall health score	`(80, 60)`	`health_thresholds`
Depth valid %	`(80, 50)`	`depth_valid_thresholds`
Sync `within_50ms_pct` (each pair)	`(90, 70)`	`sync_thresholds`
Any-hand %	`(70, 30)`	`hand_any_thresholds`
≥1-hand %	`(40, 15)`	`hand_1plus_thresholds`
Exactly-2-hand %	`(30, 10)`	`hand_2_thresholds`
IMU \|gravity\| Δ from 9.81	`< 0.5 m/s²` is green	`imu_gravity_max_dev`

The status word next to the score (Good / Watch / Issues) tracks the overall health_thresholds.

Inspecting an active run's config

The report dumps every threshold and weight that produced the score in the Technical reference → Health score config block at the bottom of the page. From Python:

ev = Evaluate(session, config=my_config)
metrics = ev.compute()
metrics["config"]      # the EvaluateConfig instance
metrics["health"]["score"]
metrics["health"]["notes"]

Tuning the score for your domain

Defaults assume a hand-manipulation recording on the Stera capture rig. Common adjustments:

Single-handed task — Don't penalise low exactly-2-hand %:

cfg = EvaluateConfig(hand_2_weight=0.0)

Stricter sync requirement — Bump the target to 95 % within 50 ms:

cfg = EvaluateConfig(sync_target=95.0, sync_weight=0.2)

Rig without LiDAR depth — Mark depth optional so a missing stream doesn't tank the score:

cfg = EvaluateConfig(depth_required=False)

See the config reference for every field.

Score and notes are derived from the same data the rest of the report shows. If a note doesn't match what you see in the section above, file a bug — they should never disagree.