Metrics
Every metric Evaluate computes, grouped by section. Use this as the reference for what each number means.
Evaluate.compute() returns a dict whose top-level keys mirror the report sections. This page is the field-level reference: what each metric is, its units, and where it shows up in the HTML.
metrics = Evaluate(session).compute()
metrics["trajectory"]["path_length_m"] # 12.43
metrics["hands"]["frames_with_2_hands_pct"] # 41.2 (frames with exactly 2 hands)Anything that can't be computed (missing stream, empty buffer) becomes None.
Recording
metrics["recording"] — file-level facts pulled from the MCAP summary.
| Key | Type | Notes |
|---|---|---|
path, filename | str | Absolute and bare filename. |
size_bytes, size_mb | int, float | File size on disk. |
duration_s, duration_hms | float, str | Seconds and HH:MM:SS. |
start_time, end_time | float | Epoch seconds. |
start_iso, end_iso, weekday | str | Human-readable wrappers. |
message_count | int | Total MCAP messages across all topics. |
topic_counts | dict[str, int] | Per-topic message counts. Rendered as a collapsible table. |
topics_present_count | int | |
missing_reference_topics | list[str] | Reference topics from MCAPReader.REFERENCE_TOPICS that have zero messages. |
RGB stream
metrics["rgb"] — pulled from session._rgb_ts() and session.rgb_intrinsics.
| Key | Type | Notes |
|---|---|---|
frame_count | int | |
effective_fps | float | frame_count / duration. |
median_dt_ms, min_dt_ms, max_dt_ms, dt_std_ms | float | Inter-frame interval statistics. |
gap_count | int | Number of inter-frame intervals greater than 2 × median_dt. Feeds into the health score. |
intrinsics | dict | See Intrinsics below. |
Intrinsics
Same shape for both rgb.intrinsics and depth.intrinsics.
| Key | Type | Notes |
|---|---|---|
width, height | int | |
fx, fy, cx, cy | float | |
fx_over_fy | float | |
fov_x_deg, fov_y_deg | float | Derived from focal length and image dimensions. |
aspect_ratio | float | |
principal_offset_px | (float, float) | cx - w/2, cy - h/2. |
principal_offset_pct | (float, float) | Same, normalised to image size. |
distortion_model | str | E.g. "plumb_bob". |
distortion | list[float] | Distortion coefficient values. |
Depth stream
metrics["depth"] — depth frames are iterated once with stride every_n = max(1, num_depth // 200) so any-length recording finishes in a few seconds. Stats below come from those sampled frames.
| Key | Type | Notes |
|---|---|---|
frame_count | int | Full count, not the sampled count. |
effective_fps | float | |
median_dt_ms | float | |
sampled_frames | int | How many frames the stats below came from. |
valid_pct_mean | float | Average % of pixels with depth > 0. Colour-coded by depth_valid_thresholds. Feeds into the health score. |
valid_pct_min, valid_pct_max, valid_pct_std | float | |
empty_frame_count | int | Frames where every pixel is zero. |
global_min_m, global_max_m | float | Across all valid pixels in the sample set. |
depth_percentiles_m | dict | {"p5", "p50", "p95"} in metres. |
depth_hist_counts, depth_hist_pct | dict | Buckets <1m / 1-2m / 2-5m / >5m. |
intrinsics | dict | See Intrinsics. |
Camera trajectory
metrics["trajectory"] — derived from session.all_camera_poses(). World frame is the MCAP's pose frame; height = Y.
| Key | Type | Notes |
|---|---|---|
pose_count | int | |
effective_rate_hz | float | |
path_length_m | float | Sum of segment lengths. |
net_displacement_m | float | Start-to-end distance. |
tortuosity | float | path_length / displacement. |
bbox_min, bbox_max, bbox_extents | list[float] | Axis-aligned bounding box in world frame. |
bbox_volume_m3 | float | |
footprint_area_m2 | float | Convex-hull area of positions projected onto the XZ plane. |
height_min_m, height_max_m, height_mean_m, height_std_m | float | Y-axis distribution. |
speed_mean_mps, speed_median_mps, speed_p95_mps, speed_max_mps | float | Segment length over segment dt. |
accel_mean_mps2, accel_max_mps2 | float | Finite difference of speed. |
yaw_rate_deg_per_s, pitch_rate_deg_per_s, roll_rate_deg_per_s | float | Median absolute angular rate per axis, from rotation-matrix ZYX Euler decomposition. |
cumulative_rotation_deg | float | Sum of |Δheading|. |
turn_count | int | Heading steps greater than 45°. |
stationary_duration_s, stationary_pct | float | Time with speed < 0.05 m/s. |
Plot inputs (ts_series, positions, speed_ts, speed_series, headings_deg) are also in the dict.
IMU
metrics["imu"] — from session.all_imu_samples().
| Key | Type | Notes |
|---|---|---|
sample_count | int | |
effective_rate_hz | float | |
rate_jitter_ms | float | Std-dev of inter-sample intervals. |
accel_axis_mean / std / min / max | list[float] | Per-axis (x, y, z) m/s². |
accel_mag_mean / std / p95 / max | float | Magnitude statistics. |
gyro_axis_mean / std / min / max | list[float] | Per-axis rad/s. |
gyro_mag_mean / std / p95 / max | float | |
gravity_vector | list[float] | Mean accel vector — approximates the gravity direction. |
gravity_magnitude | float | |mean accel|. |
gravity_deviation | float | |9.81 − gravity_magnitude|. Colour-coded by imu_gravity_max_dev. |
jolt_count | int | Samples with |accel| > 20 m/s². |
high_rotation_events | int | Samples with |gyro| > 2 rad/s. |
motion_duration_s, still_duration_s | float | Time accel-mag is above / below the gravity-normalised threshold. |
Tracking state
metrics["tracking_state"] — from /camera/tracking_state if present.
| Key | Type | Notes |
|---|---|---|
message_count | int | |
state_counts | dict[str, int] | Counts keyed by state_str from the tracking-state message. |
state_pct | dict[str, float] | Percentages of total messages. |
TF transforms
metrics["tf"] — every /tf message decoded once.
| Key | Type | Notes |
|---|---|---|
message_count | int | |
unique_pair_count | int | |
pairs | list[dict] | One row per parent→child pair: {parent, child, count, rate_hz}. |
Trajectory topic
metrics["trajectory_topic"] — the /trajectory topic (separate from /camera/pose).
| Key | Type | Notes |
|---|---|---|
pose_count | int | |
path_length_m | float | Same calculation as in trajectory, but from the /trajectory poses. |
Mesh
metrics["mesh"] — from /map/mesh.
| Key | Type | Notes |
|---|---|---|
vertex_count, face_count | int | |
bbox_extents, bbox_volume_m3 | list[float], float | |
surface_area_m2 | float | Sum of triangle areas via cross product. |
edge_length_mean_m, edge_length_p5_m, edge_length_p95_m | float | Edge-length distribution across all triangles. |
color_coverage_pct | float | % of vertices that aren't the default grey ([128,128,128]). |
verts_per_m2, faces_per_m2 | float | Density divided by surface area. |
Point cloud
metrics["point_cloud"] — from /map/mesh_cloud if present, else /map/point_cloud.
| Key | Type | Notes |
|---|---|---|
point_count | int | |
bbox_extents, bbox_volume_m3 | list[float], float | |
density_pts_per_m3 | float | |
color_coverage_pct | float | % of points with non-zero RGB. |
Sync quality
metrics["sync"] — nearest-neighbour offsets between RGB timestamps and each other stream. Each sub-block has:
| Key | Type | Notes |
|---|---|---|
median_ms, p95_ms, max_ms | float | |Δt| statistics across all RGB timestamps. |
within_50ms_pct, within_100ms_pct | float | Fraction of RGB frames whose nearest match is within the bucket. The 50 ms number is colour-coded by sync_thresholds and feeds into the health score. |
Sub-blocks:
rgb_vs_depthrgb_vs_posergb_vs_imu
Hands
metrics["hands"] — only populated when session.add_hand_pose(frame.index, hands) was called during the loop.
Three headline buckets are reported alongside the per-side detection rates:
- ≥1 hand (
frames_with_1plus_hand*) — at least one detection in the frame. Same value asframes_with_any_handand the natural "did we see hands at all" KPI. - Exactly 2 hands (
frames_with_2_hands*) — both hands cleanly detected, no spurious extras. The most useful KPI for two-handed manipulation tasks. - More than 2 hands (
frames_with_more_hands*) — typically a detection error (false-positive on background, mirrored reflection, second person in frame). Reported as a metric only; no health-score deduction by default.
The pie chart in the report shows the disjoint exact-count buckets (no hands / exactly 1 / exactly 2 / >2) derived on the fly from counts_per_frame.
| Key | Type | Notes |
|---|---|---|
backend | str | E.g. "wilor", "mediapipe", "hamer". |
frames_total | int | Same as num_rgb_frames. |
frames_with_any_hand | int | Frames with ≥1 hand (alias for frames_with_1plus_hand). |
frames_with_any_hand_pct | float | Colour-coded by hand_any_thresholds. Feeds into the health score. |
frames_with_1plus_hand, frames_with_1plus_hand_pct | int, float | Frames with at least 1 hand. Coloured by hand_1plus_thresholds. |
frames_with_2_hands, frames_with_2_hands_pct | int, float | Frames with exactly 2 hands. Coloured by hand_2_thresholds. |
frames_with_more_hands, frames_with_more_hands_pct | int, float | Frames with strictly more than 2 hands. Metric only — no score deduction by default. |
counts_per_frame | np.ndarray (frames_total,) | Number of hand detections per frame (clipped at 100). Used by the pie chart to derive exact buckets. |
left_detection_pct, right_detection_pct | float | Per-side rate. |
both_hands_pct | float | Frames with both sides detected. |
left_conf_mean, left_conf_p10 / p50 / p90 | float | Per-side confidence stats. Same for right_conf_*. |
has_3d | bool | True if any hand had non-zero z. |
mano_frames_pct | float | % of any-hand frames where MANO vertices were attached. None for non-MANO backends. |
kpts_in_frame_pct | float | Of all 2D keypoints, the fraction inside the RGB image bounds. |
left_wrist_depth_mean_m, right_wrist_depth_mean_m | float | Mean Z (camera-frame depth) of the wrist joint. |
palm_width_mean_m | float | Distance between MCP joints 5 and 17. |
grip_closure_mean_m | float | Mean tip-to-wrist distance across the 5 fingertips. |
left_wrist_track, right_wrist_track | dict | {length_m, speed_mean_mps, speed_max_mps} in the camera frame, assuming even frame timing. |
Skeleton
metrics["skeleton"] — only when you pass skeleton=... to Evaluate.
| Key | Type | Notes |
|---|---|---|
frame_count | int | |
detection_pct | float | Always 100 today — present-frame ratio. |
joint_visibility_pct | dict[str, float] | Per-joint visibility (head, neck, spine, l_shoulder, l_elbow, l_wrist, r_shoulder, r_elbow, r_wrist, mount_cam). |
elbow_left_mean_deg, elbow_right_mean_deg | float | Angle at the elbow joint, in degrees. |
reach_left_mean_m, reach_right_mean_m | float | Shoulder-to-wrist distance. |
head_height_mean_m | float | Mean Y of the head joint. |
Streamed rgb.mp4
metrics["streamed_rgb"] — only when session.add_rgb_frame was used.
| Key | Type | Notes |
|---|---|---|
active | bool | True while the H.264 writer is still open. |
tmp_path, tmp_size_bytes | str, int | Where the temp mp4 lives and how big it is at compute time. |
width, height, fps | int, float | Pulled from session.rgb_intrinsics. |
has_thumbnail | bool | Whether session._rgb_mid_frame was captured. |
Health
metrics["health"] — the rollup. See Health score for how it's computed.
| Key | Type | Notes |
|---|---|---|
score | float | 0..100, clamped. |
notes | list[str] | One human-readable line per active deduction. Rendered next to the score. |
Config snapshot
metrics["config"] — the EvaluateConfig instance used for the run. Surfaced so the report can dump the active thresholds and so downstream code can diff configs.
Plot input arrays (ts_series, positions, accel_mag_series, global_depth_samples, etc.) are kept on the dict alongside the scalars so plotting and metric layers stay decoupled. They're safe to ignore from Python.