Metrics

Every metric Evaluate computes, grouped by section. Use this as the reference for what each number means.

Evaluate.compute() returns a dict whose top-level keys mirror the report sections. This page is the field-level reference: what each metric is, its units, and where it shows up in the HTML.

metrics = Evaluate(session).compute()
metrics["trajectory"]["path_length_m"]   # 12.43
metrics["hands"]["frames_with_2_hands_pct"]  # 41.2  (frames with exactly 2 hands)

Anything that can't be computed (missing stream, empty buffer) becomes None.

Recording

metrics["recording"] — file-level facts pulled from the MCAP summary.

KeyTypeNotes
path, filenamestrAbsolute and bare filename.
size_bytes, size_mbint, floatFile size on disk.
duration_s, duration_hmsfloat, strSeconds and HH:MM:SS.
start_time, end_timefloatEpoch seconds.
start_iso, end_iso, weekdaystrHuman-readable wrappers.
message_countintTotal MCAP messages across all topics.
topic_countsdict[str, int]Per-topic message counts. Rendered as a collapsible table.
topics_present_countint
missing_reference_topicslist[str]Reference topics from MCAPReader.REFERENCE_TOPICS that have zero messages.

RGB stream

metrics["rgb"] — pulled from session._rgb_ts() and session.rgb_intrinsics.

KeyTypeNotes
frame_countint
effective_fpsfloatframe_count / duration.
median_dt_ms, min_dt_ms, max_dt_ms, dt_std_msfloatInter-frame interval statistics.
gap_countintNumber of inter-frame intervals greater than 2 × median_dt. Feeds into the health score.
intrinsicsdictSee Intrinsics below.

Intrinsics

Same shape for both rgb.intrinsics and depth.intrinsics.

KeyTypeNotes
width, heightint
fx, fy, cx, cyfloat
fx_over_fyfloat
fov_x_deg, fov_y_degfloatDerived from focal length and image dimensions.
aspect_ratiofloat
principal_offset_px(float, float)cx - w/2, cy - h/2.
principal_offset_pct(float, float)Same, normalised to image size.
distortion_modelstrE.g. "plumb_bob".
distortionlist[float]Distortion coefficient values.

Depth stream

metrics["depth"] — depth frames are iterated once with stride every_n = max(1, num_depth // 200) so any-length recording finishes in a few seconds. Stats below come from those sampled frames.

KeyTypeNotes
frame_countintFull count, not the sampled count.
effective_fpsfloat
median_dt_msfloat
sampled_framesintHow many frames the stats below came from.
valid_pct_meanfloatAverage % of pixels with depth > 0. Colour-coded by depth_valid_thresholds. Feeds into the health score.
valid_pct_min, valid_pct_max, valid_pct_stdfloat
empty_frame_countintFrames where every pixel is zero.
global_min_m, global_max_mfloatAcross all valid pixels in the sample set.
depth_percentiles_mdict{"p5", "p50", "p95"} in metres.
depth_hist_counts, depth_hist_pctdictBuckets <1m / 1-2m / 2-5m / >5m.
intrinsicsdictSee Intrinsics.

Camera trajectory

metrics["trajectory"] — derived from session.all_camera_poses(). World frame is the MCAP's pose frame; height = Y.

KeyTypeNotes
pose_countint
effective_rate_hzfloat
path_length_mfloatSum of segment lengths.
net_displacement_mfloatStart-to-end distance.
tortuosityfloatpath_length / displacement.
bbox_min, bbox_max, bbox_extentslist[float]Axis-aligned bounding box in world frame.
bbox_volume_m3float
footprint_area_m2floatConvex-hull area of positions projected onto the XZ plane.
height_min_m, height_max_m, height_mean_m, height_std_mfloatY-axis distribution.
speed_mean_mps, speed_median_mps, speed_p95_mps, speed_max_mpsfloatSegment length over segment dt.
accel_mean_mps2, accel_max_mps2floatFinite difference of speed.
yaw_rate_deg_per_s, pitch_rate_deg_per_s, roll_rate_deg_per_sfloatMedian absolute angular rate per axis, from rotation-matrix ZYX Euler decomposition.
cumulative_rotation_degfloatSum of |Δheading|.
turn_countintHeading steps greater than 45°.
stationary_duration_s, stationary_pctfloatTime with speed < 0.05 m/s.

Plot inputs (ts_series, positions, speed_ts, speed_series, headings_deg) are also in the dict.

IMU

metrics["imu"] — from session.all_imu_samples().

KeyTypeNotes
sample_countint
effective_rate_hzfloat
rate_jitter_msfloatStd-dev of inter-sample intervals.
accel_axis_mean / std / min / maxlist[float]Per-axis (x, y, z) m/s².
accel_mag_mean / std / p95 / maxfloatMagnitude statistics.
gyro_axis_mean / std / min / maxlist[float]Per-axis rad/s.
gyro_mag_mean / std / p95 / maxfloat
gravity_vectorlist[float]Mean accel vector — approximates the gravity direction.
gravity_magnitudefloat|mean accel|.
gravity_deviationfloat|9.81 − gravity_magnitude|. Colour-coded by imu_gravity_max_dev.
jolt_countintSamples with |accel| > 20 m/s².
high_rotation_eventsintSamples with |gyro| > 2 rad/s.
motion_duration_s, still_duration_sfloatTime accel-mag is above / below the gravity-normalised threshold.

Tracking state

metrics["tracking_state"] — from /camera/tracking_state if present.

KeyTypeNotes
message_countint
state_countsdict[str, int]Counts keyed by state_str from the tracking-state message.
state_pctdict[str, float]Percentages of total messages.

TF transforms

metrics["tf"] — every /tf message decoded once.

KeyTypeNotes
message_countint
unique_pair_countint
pairslist[dict]One row per parent→child pair: {parent, child, count, rate_hz}.

Trajectory topic

metrics["trajectory_topic"] — the /trajectory topic (separate from /camera/pose).

KeyTypeNotes
pose_countint
path_length_mfloatSame calculation as in trajectory, but from the /trajectory poses.

Mesh

metrics["mesh"] — from /map/mesh.

KeyTypeNotes
vertex_count, face_countint
bbox_extents, bbox_volume_m3list[float], float
surface_area_m2floatSum of triangle areas via cross product.
edge_length_mean_m, edge_length_p5_m, edge_length_p95_mfloatEdge-length distribution across all triangles.
color_coverage_pctfloat% of vertices that aren't the default grey ([128,128,128]).
verts_per_m2, faces_per_m2floatDensity divided by surface area.

Point cloud

metrics["point_cloud"] — from /map/mesh_cloud if present, else /map/point_cloud.

KeyTypeNotes
point_countint
bbox_extents, bbox_volume_m3list[float], float
density_pts_per_m3float
color_coverage_pctfloat% of points with non-zero RGB.

Sync quality

metrics["sync"] — nearest-neighbour offsets between RGB timestamps and each other stream. Each sub-block has:

KeyTypeNotes
median_ms, p95_ms, max_msfloat|Δt| statistics across all RGB timestamps.
within_50ms_pct, within_100ms_pctfloatFraction of RGB frames whose nearest match is within the bucket. The 50 ms number is colour-coded by sync_thresholds and feeds into the health score.

Sub-blocks:

  • rgb_vs_depth
  • rgb_vs_pose
  • rgb_vs_imu

Hands

metrics["hands"] — only populated when session.add_hand_pose(frame.index, hands) was called during the loop.

Three headline buckets are reported alongside the per-side detection rates:

  • ≥1 hand (frames_with_1plus_hand*) — at least one detection in the frame. Same value as frames_with_any_hand and the natural "did we see hands at all" KPI.
  • Exactly 2 hands (frames_with_2_hands*) — both hands cleanly detected, no spurious extras. The most useful KPI for two-handed manipulation tasks.
  • More than 2 hands (frames_with_more_hands*) — typically a detection error (false-positive on background, mirrored reflection, second person in frame). Reported as a metric only; no health-score deduction by default.

The pie chart in the report shows the disjoint exact-count buckets (no hands / exactly 1 / exactly 2 / >2) derived on the fly from counts_per_frame.

KeyTypeNotes
backendstrE.g. "wilor", "mediapipe", "hamer".
frames_totalintSame as num_rgb_frames.
frames_with_any_handintFrames with ≥1 hand (alias for frames_with_1plus_hand).
frames_with_any_hand_pctfloatColour-coded by hand_any_thresholds. Feeds into the health score.
frames_with_1plus_hand, frames_with_1plus_hand_pctint, floatFrames with at least 1 hand. Coloured by hand_1plus_thresholds.
frames_with_2_hands, frames_with_2_hands_pctint, floatFrames with exactly 2 hands. Coloured by hand_2_thresholds.
frames_with_more_hands, frames_with_more_hands_pctint, floatFrames with strictly more than 2 hands. Metric only — no score deduction by default.
counts_per_framenp.ndarray (frames_total,)Number of hand detections per frame (clipped at 100). Used by the pie chart to derive exact buckets.
left_detection_pct, right_detection_pctfloatPer-side rate.
both_hands_pctfloatFrames with both sides detected.
left_conf_mean, left_conf_p10 / p50 / p90floatPer-side confidence stats. Same for right_conf_*.
has_3dboolTrue if any hand had non-zero z.
mano_frames_pctfloat% of any-hand frames where MANO vertices were attached. None for non-MANO backends.
kpts_in_frame_pctfloatOf all 2D keypoints, the fraction inside the RGB image bounds.
left_wrist_depth_mean_m, right_wrist_depth_mean_mfloatMean Z (camera-frame depth) of the wrist joint.
palm_width_mean_mfloatDistance between MCP joints 5 and 17.
grip_closure_mean_mfloatMean tip-to-wrist distance across the 5 fingertips.
left_wrist_track, right_wrist_trackdict{length_m, speed_mean_mps, speed_max_mps} in the camera frame, assuming even frame timing.

Skeleton

metrics["skeleton"] — only when you pass skeleton=... to Evaluate.

KeyTypeNotes
frame_countint
detection_pctfloatAlways 100 today — present-frame ratio.
joint_visibility_pctdict[str, float]Per-joint visibility (head, neck, spine, l_shoulder, l_elbow, l_wrist, r_shoulder, r_elbow, r_wrist, mount_cam).
elbow_left_mean_deg, elbow_right_mean_degfloatAngle at the elbow joint, in degrees.
reach_left_mean_m, reach_right_mean_mfloatShoulder-to-wrist distance.
head_height_mean_mfloatMean Y of the head joint.

Streamed rgb.mp4

metrics["streamed_rgb"] — only when session.add_rgb_frame was used.

KeyTypeNotes
activeboolTrue while the H.264 writer is still open.
tmp_path, tmp_size_bytesstr, intWhere the temp mp4 lives and how big it is at compute time.
width, height, fpsint, floatPulled from session.rgb_intrinsics.
has_thumbnailboolWhether session._rgb_mid_frame was captured.

Health

metrics["health"] — the rollup. See Health score for how it's computed.

KeyTypeNotes
scorefloat0..100, clamped.
noteslist[str]One human-readable line per active deduction. Rendered next to the score.

Config snapshot

metrics["config"] — the EvaluateConfig instance used for the run. Surfaced so the report can dump the active thresholds and so downstream code can diff configs.

Plot input arrays (ts_series, positions, accel_mag_series, global_depth_samples, etc.) are kept on the dict alongside the scalars so plotting and metric layers stay decoupled. They're safe to ignore from Python.