EvaluateConfig

Every tunable threshold and penalty weight for the Evaluate health score, plus override recipes.

from stera import Evaluate, EvaluateConfig

cfg = EvaluateConfig(
    sync_target=95.0,
    hand_2_weight=0.0,
    depth_required=False,
)
Evaluate(session, config=cfg).show()

EvaluateConfig is a frozen-by-convention dataclass with three kinds of fields:

  1. Color thresholds(good_min, ok_min) pairs that drive the green / amber / red chips in the report.
  2. Health-score deductionstarget + weight pairs that drive the 0–100 score. Set any weight to 0 to disable that check.
  3. Required-stream flags — booleans (depth_required, imu_required) that gate the corresponding *_missing_penalty. Both default to True.

Hand checks have three buckets:

  • hand_1plus_* — frames with at least 1 hand (= any-hand).
  • hand_2_* — frames with exactly 2 hands.
  • frames_with_more_hands* — frames with strictly more than 2 hands. Metric only; no config knobs because it's a detection-error indicator rather than a quality signal.

Defaults at a glance

EvaluateConfig(
    # color thresholds (good_min, ok_min)
    health_thresholds       = (80.0, 60.0),
    depth_valid_thresholds  = (80.0, 50.0),
    sync_thresholds         = (90.0, 70.0),
    hand_any_thresholds     = (70.0, 30.0),
    hand_1plus_thresholds   = (40.0, 15.0),
    hand_2_thresholds       = (30.0, 10.0),
    imu_gravity_max_dev     = 0.5,        # m/s² Δ from 9.81 for green

    # required-stream flags (True = penalise when missing)
    depth_required          = True,
    imu_required            = True,

    # health-score deductions
    rgb_gap_max_penalty     = 10.0,
    depth_valid_target      = 80.0,
    depth_valid_weight      = 0.3,
    depth_missing_penalty   = 10.0,
    pose_missing_penalty    = 15.0,
    imu_missing_penalty     = 5.0,
    sync_target             = 90.0,
    sync_weight             = 0.1,
    hand_any_target         = 50.0,
    hand_any_weight         = 0.15,
    hand_1plus_target       = 30.0,
    hand_1plus_weight       = 0.05,
    hand_2_target           = 20.0,
    hand_2_weight           = 0.10,
    hand_missing_penalty    = 0.0,
)

Color thresholds

A value ≥ good_min renders green, ≥ ok_min amber, otherwise red. Higher is always better.

FieldDefaultDrives
health_thresholds(80, 60)The big score number in the Summary and the Good / Watch / Issues status word.
depth_valid_thresholds(80, 50)"Valid mean" chip in the Depth section.
sync_thresholds(90, 70)The three RGB↔X chips in the Sync section.
hand_any_thresholds(70, 30)"Any hand" chip in the Hands section.
hand_1plus_thresholds(40, 15)"≥1 hand" chip.
hand_2_thresholds(30, 10)"2 hands" chip (exactly two).

imu_gravity_max_dev (default 0.5 m/s²) is a single number, not a pair. The |gravity| chip in the IMU section is green when |9.81 − gravity_magnitude| ≤ imu_gravity_max_dev, amber otherwise.

Required-stream flags

Two booleans control whether a missing stream triggers its flat *_missing_penalty. Both default to True — the score will drop when these streams are absent. Flip to False to make the stream optional for the score (the corresponding sections in the report still render as "no data" either way).

FieldDefaultEffect when False
depth_requiredTruedepth_missing_penalty is skipped even if the depth stream is absent.
imu_requiredTrueimu_missing_penalty is skipped even if the IMU stream is absent.

Pose is mandatory unconditionally (the SDK assumes ARKit pose is always present for an MCAP from the Stera App). If you need to make pose optional, set pose_missing_penalty=0.

Health-score deductions

FieldDefaultWhat it does
rgb_gap_max_penalty10Caps the RGB-frame-gap deduction. Each detected gap subtracts 1 up to this cap. Set to 0 to disable.
depth_valid_target80Below this, deducts (target − actual) × depth_valid_weight.
depth_valid_weight0.3Scales the depth deduction. 0 disables.
depth_missing_penalty10Flat deduction when there is no depth stream and depth_required=True.
pose_missing_penalty15Flat deduction when /camera/pose is missing.
imu_missing_penalty5Flat deduction when /device/imu is missing and imu_required=True.
sync_target90within_50ms_pct threshold per sync pair.
sync_weight0.1Per-pair deduction (target − pct) × weight. Applied separately to RGB↔Depth and RGB↔Pose.
hand_any_target50Below this, deducts (target − any_pct) × hand_any_weight.
hand_any_weight0.15
hand_1plus_target30Same shape for frames with ≥1 hand.
hand_1plus_weight0.05
hand_2_target20Same shape for frames with exactly 2 hands.
hand_2_weight0.10
hand_missing_penalty0Flat deduction when session.add_hand_pose was never called. Off by default so headless QC of recordings without hand detection still scores 100.

The full deduction logic lives in stera.eval.metrics.compute_health. See Health score for the formulae and a worked example.

Override recipes

Stricter sync

EvaluateConfig(sync_target=95.0, sync_weight=0.2)

Raises the bar to 95 % of RGB frames within 50 ms of their nearest depth/pose sample, and doubles the deduction so it dominates the score.

Single-handed task

EvaluateConfig(hand_2_weight=0.0, hand_2_thresholds=(0.0, 0.0))

Stops penalising low exactly-2-hand frame counts and removes the red colour on the "2 hands" chip. Useful for recordings where one hand is holding the camera.

Rig without depth / IMU

EvaluateConfig(depth_required=False, imu_required=False)

Marks both streams optional. A recording from a non-LiDAR phone or an external camera no longer takes the flat 10 + 5 hit for missing those topics.

Headless QC: only fail on missing streams

EvaluateConfig(
    depth_valid_weight=0.0,
    sync_weight=0.0,
    hand_any_weight=0.0,
    hand_1plus_weight=0.0,
    hand_2_weight=0.0,
    rgb_gap_max_penalty=0.0,
)

Disables every threshold-based check. Score only drops when a whole stream is missing (depth, pose, IMU), which is the bare minimum you'd want in a CI pipeline.

Require hand annotations

EvaluateConfig(hand_missing_penalty=20.0)

Run this after a detection pipeline that should have called session.add_hand_pose. If the buffer is empty (the loop never wrote any), the score drops by 20.

Inspecting the active config

Evaluate.compute() puts the config it ran with on the metrics dict, and the report dumps the same in Technical reference → Health score config:

metrics = Evaluate(session, config=my_cfg).compute()
assert metrics["config"] is my_cfg

Future revisions may add new fields (e.g. tracking-state penalties). They'll land with safe defaults, so existing configs will continue to score the same way unless you opt in.