Evaluate
One call generates an interactive HTML report with metrics, charts, and a health score for any Stera recording.
Evaluate is the QC step. Point it at a session and it produces a single self-contained HTML file with metrics for every stream in the recording — RGB, depth, camera trajectory, IMU, sync quality, hand detections, mesh, point cloud — plus a 0–100 health score that flags whatever looks off.
from stera import Evaluate
Evaluate(session).show() # opens in your browser
Evaluate(session).export("report.html") # writes to diskThat's the whole surface. No required kwargs.
Where it fits
After Process runs your detectors and refiners, but before (or after) Export bundles the episode. Typical end-of-loop:
for frame in session.frames():
hands = tracker.detect_hands(frame)
session.add_hand_pose(frame.index, hands)
visualizer.log_frame(frame, hands=hands)
session.export("episodes/run_01", visualizer=viz)
Evaluate(session).show() # ← QC the runEvaluate reads the session in place: camera poses, IMU samples, depth frames (subsampled), mesh / point-cloud topics, plus any hand poses you buffered via session.add_hand_pose. Nothing needs to be on disk — you can run it on a session you've only iterated once.
What the report looks like
A single HTML file (~19 MB, Plotly.js inlined so it opens offline). Layout:
- Sticky side-nav jumps between sections.
- Summary — thumbnail, file name, 8 hero KPIs (duration, frames, FPS, distance, hands present, file size, IMU rate, depth valid), and a big colour-coded health score with notes for each deduction.
- Trajectory — top-down xz path, height vs time, speed time-series + histogram, and 12+ derived numbers (path length, footprint area, speed percentiles, turn count, stationary %).
- IMU — accel + gyro magnitude time-series on a twin-axis chart, gravity vector check, jolt count.
- Depth — per-frame valid-pixel % over time, global depth histogram, range percentiles.
- Hands — pie chart of frames with 0 / exactly-1 / exactly-2 / >2 hands, confidence histogram per side, detection timeline, plus three headline percentages: ≥1 hand (at least one detection), exactly 2 hands, and >2 hands (typically a detection error). The ≥1 and exactly-2 buckets each have their own configurable colour threshold.
- Sync — RGB↔Depth and RGB↔Pose offset histograms with a 50 ms threshold marker.
- 3D map — vertex / face / surface-area / point-cloud size stats.
- Coverage timeline — which streams have data at each second.
- Technical reference (collapsed) — recording metadata, topic message counts, intrinsics, TF pairs, tracking state, skeleton, the active
EvaluateConfig.
Every section's headline numbers render as colour chips (green / amber / red). The full numeric table is tucked behind a → Show all X metrics disclosure so the page stays scannable.
Configuring thresholds
Default targets are tuned for hand-manipulation recordings on the Stera capture rig. Override anything via EvaluateConfig:
from stera import Evaluate, EvaluateConfig
cfg = EvaluateConfig(
sync_target=95.0, # stricter sync
hand_2_weight=0.0, # don't penalize low exactly-2-hand recordings
depth_required=False, # don't fail the score when depth is missing
)
Evaluate(session, config=cfg).show()The active config is dumped in the report's Technical reference → Health score config block, so reviewing exactly which thresholds produced a score is one click away.
API
class Evaluate:
def __init__(self, session, skeleton=None, config=None) -> None
def compute(self) -> dict # crunch metrics, cached
def export(self, path: str | Path) -> str # write report.html, return path
def show(self) -> str # write to a temp file + open in browser| Param | Default | Notes |
|---|---|---|
session | required | An MCAPReader. Any hand poses buffered via session.add_hand_pose are picked up automatically. |
skeleton | None | Optional list of SkeletonFrame from UpperBodyEstimator. Adds a skeleton-stats block to the report's Technical reference. |
config | EvaluateConfig() | Tune thresholds and health-score weights. See Config reference. |
compute() returns a plain dict you can inspect in code if you don't need the HTML:
metrics = Evaluate(session).compute()
print(metrics["health"]["score"], metrics["health"]["notes"])
print(metrics["hands"]["frames_with_2_hands_pct"]) # % of frames with exactly 2 hands
print(metrics["trajectory"]["path_length_m"])The dict's top-level keys mirror the report sections (recording, rgb, depth, trajectory, imu, tracking_state, tf, mesh, point_cloud, sync, hands, skeleton, streamed_rgb, health, config). Missing streams are None.
What's in these docs
Metrics
Every number Evaluate computes, grouped by section, with units and what it measures.
Health score
How the 0–100 score is computed, the full deduction ruleset, and the colour thresholds.
Config
EvaluateConfig reference — every tunable threshold and weight, with override recipes.
Evaluate never throws for missing data. Sections without an underlying stream simply render as "no data" — same contract as session.export.