EvaluateConfig

Every tunable threshold and penalty weight for the Evaluate health score, plus override recipes.

from stera import Evaluate, EvaluateConfig

cfg = EvaluateConfig(
    sync_target=95.0,
    hand_2_weight=0.0,
    depth_required=False,
)
Evaluate(session, config=cfg).show()

EvaluateConfig is a frozen-by-convention dataclass with three kinds of fields:

Color thresholds — (good_min, ok_min) pairs that drive the green / amber / red chips in the report.
Health-score deductions — target + weight pairs that drive the 0–100 score. Set any weight to 0 to disable that check.
Required-stream flags — booleans (depth_required, imu_required) that gate the corresponding *_missing_penalty. Both default to True.

Hand checks have three buckets:

hand_1plus_* — frames with at least 1 hand (= any-hand).
hand_2_* — frames with exactly 2 hands.
frames_with_more_hands* — frames with strictly more than 2 hands. Metric only; no config knobs because it's a detection-error indicator rather than a quality signal.

Defaults at a glance

EvaluateConfig(
    # color thresholds (good_min, ok_min)
    health_thresholds       = (80.0, 60.0),
    depth_valid_thresholds  = (80.0, 50.0),
    sync_thresholds         = (90.0, 70.0),
    hand_any_thresholds     = (70.0, 30.0),
    hand_1plus_thresholds   = (40.0, 15.0),
    hand_2_thresholds       = (30.0, 10.0),
    imu_gravity_max_dev     = 0.5,        # m/s² Δ from 9.81 for green

    # required-stream flags (True = penalise when missing)
    depth_required          = True,
    imu_required            = True,

    # health-score deductions
    rgb_gap_max_penalty     = 10.0,
    depth_valid_target      = 80.0,
    depth_valid_weight      = 0.3,
    depth_missing_penalty   = 10.0,
    pose_missing_penalty    = 15.0,
    imu_missing_penalty     = 5.0,
    sync_target             = 90.0,
    sync_weight             = 0.1,
    hand_any_target         = 50.0,
    hand_any_weight         = 0.15,
    hand_1plus_target       = 30.0,
    hand_1plus_weight       = 0.05,
    hand_2_target           = 20.0,
    hand_2_weight           = 0.10,
    hand_missing_penalty    = 0.0,
)

Color thresholds

A value ≥ good_min renders green, ≥ ok_min amber, otherwise red. Higher is always better.

Field	Default	Drives
`health_thresholds`	`(80, 60)`	The big score number in the Summary and the `Good / Watch / Issues` status word.
`depth_valid_thresholds`	`(80, 50)`	"Valid mean" chip in the Depth section.
`sync_thresholds`	`(90, 70)`	The three RGB↔X chips in the Sync section.
`hand_any_thresholds`	`(70, 30)`	"Any hand" chip in the Hands section.
`hand_1plus_thresholds`	`(40, 15)`	"≥1 hand" chip.
`hand_2_thresholds`	`(30, 10)`	"2 hands" chip (exactly two).

imu_gravity_max_dev (default 0.5 m/s²) is a single number, not a pair. The |gravity| chip in the IMU section is green when |9.81 − gravity_magnitude| ≤ imu_gravity_max_dev, amber otherwise.

Two booleans control whether a missing stream triggers its flat *_missing_penalty. Both default to True — the score will drop when these streams are absent. Flip to False to make the stream optional for the score (the corresponding sections in the report still render as "no data" either way).

Field	Default	Effect when `False`
`depth_required`	`True`	`depth_missing_penalty` is skipped even if the depth stream is absent.
`imu_required`	`True`	`imu_missing_penalty` is skipped even if the IMU stream is absent.

Pose is mandatory unconditionally (the SDK assumes ARKit pose is always present for an MCAP from the Stera App). If you need to make pose optional, set pose_missing_penalty=0.

Health-score deductions

Field	Default	What it does
`rgb_gap_max_penalty`	`10`	Caps the RGB-frame-gap deduction. Each detected gap subtracts 1 up to this cap. Set to `0` to disable.
`depth_valid_target`	`80`	Below this, deducts `(target − actual) × depth_valid_weight`.
`depth_valid_weight`	`0.3`	Scales the depth deduction. `0` disables.
`depth_missing_penalty`	`10`	Flat deduction when there is no depth stream and `depth_required=True`.
`pose_missing_penalty`	`15`	Flat deduction when `/camera/pose` is missing.
`imu_missing_penalty`	`5`	Flat deduction when `/device/imu` is missing and `imu_required=True`.
`sync_target`	`90`	`within_50ms_pct` threshold per sync pair.
`sync_weight`	`0.1`	Per-pair deduction `(target − pct) × weight`. Applied separately to RGB↔Depth and RGB↔Pose.
`hand_any_target`	`50`	Below this, deducts `(target − any_pct) × hand_any_weight`.
`hand_any_weight`	`0.15`
`hand_1plus_target`	`30`	Same shape for frames with ≥1 hand.
`hand_1plus_weight`	`0.05`
`hand_2_target`	`20`	Same shape for frames with exactly 2 hands.
`hand_2_weight`	`0.10`
`hand_missing_penalty`	`0`	Flat deduction when `session.add_hand_pose` was never called. Off by default so headless QC of recordings without hand detection still scores 100.

The full deduction logic lives in stera.eval.metrics.compute_health. See Health score for the formulae and a worked example.

Override recipes

Stricter sync

EvaluateConfig(sync_target=95.0, sync_weight=0.2)

Raises the bar to 95 % of RGB frames within 50 ms of their nearest depth/pose sample, and doubles the deduction so it dominates the score.

Single-handed task

EvaluateConfig(hand_2_weight=0.0, hand_2_thresholds=(0.0, 0.0))

Stops penalising low exactly-2-hand frame counts and removes the red colour on the "2 hands" chip. Useful for recordings where one hand is holding the camera.

Rig without depth / IMU

EvaluateConfig(depth_required=False, imu_required=False)

Marks both streams optional. A recording from a non-LiDAR phone or an external camera no longer takes the flat 10 + 5 hit for missing those topics.

Headless QC: only fail on missing streams

EvaluateConfig(
    depth_valid_weight=0.0,
    sync_weight=0.0,
    hand_any_weight=0.0,
    hand_1plus_weight=0.0,
    hand_2_weight=0.0,
    rgb_gap_max_penalty=0.0,
)

Disables every threshold-based check. Score only drops when a whole stream is missing (depth, pose, IMU), which is the bare minimum you'd want in a CI pipeline.

Require hand annotations

EvaluateConfig(hand_missing_penalty=20.0)

Run this after a detection pipeline that should have called session.add_hand_pose. If the buffer is empty (the loop never wrote any), the score drops by 20.

Inspecting the active config

Evaluate.compute() puts the config it ran with on the metrics dict, and the report dumps the same in Technical reference → Health score config:

metrics = Evaluate(session, config=my_cfg).compute()
assert metrics["config"] is my_cfg

Future revisions may add new fields (e.g. tracking-state penalties). They'll land with safe defaults, so existing configs will continue to score the same way unless you opt in.

EvaluateConfig

On this page