Evaluate

One call generates an interactive HTML report with metrics, charts, and a health score for any Stera recording.

Evaluate is the QC step. Point it at a session and it produces a single self-contained HTML file with metrics for every stream in the recording — RGB, depth, camera trajectory, IMU, sync quality, hand detections, mesh, point cloud — plus a 0–100 health score that flags whatever looks off.

from stera import Evaluate

Evaluate(session).show()                       # opens in your browser
Evaluate(session).export("report.html")        # writes to disk

That's the whole surface. No required kwargs.

Where it fits

After Process runs your detectors and refiners, but before (or after) Export bundles the episode. Typical end-of-loop:

for frame in session.frames():
    hands = tracker.detect_hands(frame)
    session.add_hand_pose(frame.index, hands)
    visualizer.log_frame(frame, hands=hands)

session.export("episodes/run_01", visualizer=viz)
Evaluate(session).show()                       # ← QC the run

Evaluate reads the session in place: camera poses, IMU samples, depth frames (subsampled), mesh / point-cloud topics, plus any hand poses you buffered via session.add_hand_pose. Nothing needs to be on disk — you can run it on a session you've only iterated once.

What the report looks like

A single HTML file (~19 MB, Plotly.js inlined so it opens offline). Layout:

  • Sticky side-nav jumps between sections.
  • Summary — thumbnail, file name, 8 hero KPIs (duration, frames, FPS, distance, hands present, file size, IMU rate, depth valid), and a big colour-coded health score with notes for each deduction.
  • Trajectory — top-down xz path, height vs time, speed time-series + histogram, and 12+ derived numbers (path length, footprint area, speed percentiles, turn count, stationary %).
  • IMU — accel + gyro magnitude time-series on a twin-axis chart, gravity vector check, jolt count.
  • Depth — per-frame valid-pixel % over time, global depth histogram, range percentiles.
  • Hands — pie chart of frames with 0 / exactly-1 / exactly-2 / >2 hands, confidence histogram per side, detection timeline, plus three headline percentages: ≥1 hand (at least one detection), exactly 2 hands, and >2 hands (typically a detection error). The ≥1 and exactly-2 buckets each have their own configurable colour threshold.
  • Sync — RGB↔Depth and RGB↔Pose offset histograms with a 50 ms threshold marker.
  • 3D map — vertex / face / surface-area / point-cloud size stats.
  • Coverage timeline — which streams have data at each second.
  • Technical reference (collapsed) — recording metadata, topic message counts, intrinsics, TF pairs, tracking state, skeleton, the active EvaluateConfig.

Every section's headline numbers render as colour chips (green / amber / red). The full numeric table is tucked behind a → Show all X metrics disclosure so the page stays scannable.

Configuring thresholds

Default targets are tuned for hand-manipulation recordings on the Stera capture rig. Override anything via EvaluateConfig:

from stera import Evaluate, EvaluateConfig

cfg = EvaluateConfig(
    sync_target=95.0,         # stricter sync
    hand_2_weight=0.0,        # don't penalize low exactly-2-hand recordings
    depth_required=False,     # don't fail the score when depth is missing
)
Evaluate(session, config=cfg).show()

The active config is dumped in the report's Technical reference → Health score config block, so reviewing exactly which thresholds produced a score is one click away.

API

class Evaluate:
    def __init__(self, session, skeleton=None, config=None) -> None
    def compute(self) -> dict                 # crunch metrics, cached
    def export(self, path: str | Path) -> str # write report.html, return path
    def show(self) -> str                     # write to a temp file + open in browser
ParamDefaultNotes
sessionrequiredAn MCAPReader. Any hand poses buffered via session.add_hand_pose are picked up automatically.
skeletonNoneOptional list of SkeletonFrame from UpperBodyEstimator. Adds a skeleton-stats block to the report's Technical reference.
configEvaluateConfig()Tune thresholds and health-score weights. See Config reference.

compute() returns a plain dict you can inspect in code if you don't need the HTML:

metrics = Evaluate(session).compute()
print(metrics["health"]["score"], metrics["health"]["notes"])
print(metrics["hands"]["frames_with_2_hands_pct"])   # % of frames with exactly 2 hands
print(metrics["trajectory"]["path_length_m"])

The dict's top-level keys mirror the report sections (recording, rgb, depth, trajectory, imu, tracking_state, tf, mesh, point_cloud, sync, hands, skeleton, streamed_rgb, health, config). Missing streams are None.

What's in these docs

Evaluate never throws for missing data. Sections without an underlying stream simply render as "no data" — same contract as session.export.