Health score

The 0–100 rollup at the top of every Evaluate report — how it's computed, what gets deducted, and where the colour thresholds live.

The health score is a deliberately simple deduction model. It starts at 100 and subtracts a penalty for each check that looks off. The result is clamped to [0, 100]. Each active deduction shows up as a one-line note next to the score in the report.

metrics = Evaluate(session).compute()
metrics["health"]["score"]   # 87.4
metrics["health"]["notes"]   # ["Depth valid 72%", "1 RGB frame gaps"]

The ruleset

Every penalty is max(0, target − actual) × weight, with two exceptions: RGB gaps subtract min(gap_count, rgb_gap_max_penalty) directly, and "missing stream" penalties are flat. Setting any *_weight to 0 disables that check entirely.

Hand checks use three buckets: ≥1 hand (at-least-one detection), exactly 2 hands, and >2 hands (metric only — typically a detection-error indicator, no deduction by default).

CheckDefault penaltyWhat triggers it
RGB frame gapsmin(gap_count, 10)Inter-frame interval > 2 × median dt.
Depth valid % below target(80 − actual) × 0.3valid_pct_mean below depth_valid_target.
Depth missing−10 (flat)No depth stream and depth_required=True (default).
Pose missing−15 (flat)No /camera/pose messages.
IMU missing−5 (flat)No /device/imu messages and imu_required=True (default).
Sync (RGB↔Depth) below target(90 − pct) × 0.1within_50ms_pct below sync_target.
Sync (RGB↔Pose) below target(90 − pct) × 0.1Same calculation, separate deduction.
Any-hand % below target(50 − pct) × 0.15frames_with_any_hand_pct below hand_any_target.
≥1-hand % below target(30 − pct) × 0.05frames_with_1plus_hand_pct below hand_1plus_target.
Exactly-2-hand % below target(20 − pct) × 0.10frames_with_2_hands_pct below hand_2_target.
Hand-pose buffer empty0 (off by default)Set hand_missing_penalty > 0 to require session.add_hand_pose.

All thresholds and weights live on EvaluateConfig. See the config reference for the full field list with defaults.

Making streams optional

Depth and IMU are marked mandatory by default: if the recording is missing the topic entirely, the score takes a flat hit (10 and 5 respectively). For headless QC or rigs without a depth sensor, flip them off:

EvaluateConfig(depth_required=False, imu_required=False)

The corresponding _missing_penalty value is then ignored — the recording can be missing the stream without penalty. The colour-coded chips for valid-depth %, gravity, etc. still render normally when the stream is present.

Worked example

Imagine a 12-minute recording where:

  • Depth valid % averaged 62 % (below the 80 target).
  • RGB↔Pose sync had 84 % of frames within 50 ms (below the 90 target).
  • ≥1 hand detected on 22 % of frames (below the 30 default target).
  • Exactly 2 hands on 8 % of frames (below the 20 default target).

Defaults would deduct:

CheckMathDeduction
Depth valid(80 − 62) × 0.35.4
RGB↔Pose sync(90 − 84) × 0.10.6
Any-hand(50 − 22) × 0.154.2
≥1-hand(30 − 22) × 0.050.4
Exactly-2-hand(20 − 8) × 0.101.2
Total100 − 11.888.2

Colour thresholds

Separate from the deduction targets, colour chips in the report are driven by (good_min, ok_min) pairs. A value ≥ good_min renders green, ≥ ok_min amber, otherwise red.

MetricDefault (good, ok)Config field
Overall health score(80, 60)health_thresholds
Depth valid %(80, 50)depth_valid_thresholds
Sync within_50ms_pct (each pair)(90, 70)sync_thresholds
Any-hand %(70, 30)hand_any_thresholds
≥1-hand %(40, 15)hand_1plus_thresholds
Exactly-2-hand %(30, 10)hand_2_thresholds
IMU |gravity| Δ from 9.81< 0.5 m/s² is greenimu_gravity_max_dev

The status word next to the score (Good / Watch / Issues) tracks the overall health_thresholds.

Inspecting an active run's config

The report dumps every threshold and weight that produced the score in the Technical reference → Health score config block at the bottom of the page. From Python:

ev = Evaluate(session, config=my_config)
metrics = ev.compute()
metrics["config"]      # the EvaluateConfig instance
metrics["health"]["score"]
metrics["health"]["notes"]

Tuning the score for your domain

Defaults assume a hand-manipulation recording on the Stera capture rig. Common adjustments:

Single-handed task — Don't penalise low exactly-2-hand %:

cfg = EvaluateConfig(hand_2_weight=0.0)

Stricter sync requirement — Bump the target to 95 % within 50 ms:

cfg = EvaluateConfig(sync_target=95.0, sync_weight=0.2)

Rig without LiDAR depth — Mark depth optional so a missing stream doesn't tank the score:

cfg = EvaluateConfig(depth_required=False)

See the config reference for every field.

Score and notes are derived from the same data the rest of the report shows. If a note doesn't match what you see in the section above, file a bug — they should never disagree.