Guides

Reading an MCAP

Open a Stera recording with MCAPReader, iterate raw streams or synced frames, access intrinsics, IMU, and TF transforms.

Open a recording

from stera.data import MCAPReader

session = MCAPReader("recording.mcap")
print(session.duration)           # seconds
print(session.num_rgb_frames)
print(session.num_depth_frames)
print(session.rgb_intrinsics.K)   # (3, 3) numpy array

MCAPReader is permissive by default: missing topics produce empty iterators or None fields, never errors. Pass check_format=True to enforce the reference topic fingerprint:

session = MCAPReader("recording.mcap", check_format=True)
# Raises ValueError if any of /camera/rgb/compressed, /camera/depth,
# /camera/pose, /device/imu, /tf, /trajectory, etc. are missing.

Two iteration modes

Raw per-stream iterators

When you only care about one stream, iterate it directly. Each method yields (timestamp_seconds, decoded_message).

RGB: (H, W, 3) uint8 frames:

for ts, rgb in session.rgb_frames():
    ...

Depth: (H, W) uint16 millimetres:

for ts, depth in session.depth_frames():
    ...

Camera pose: Pose6D in the world frame:

for ts, pose in session.camera_poses():
    ...

IMU: dict with linear_acceleration, angular_velocity, orientation:

for ts, imu in session.imu_samples():
    print(imu["linear_acceleration"], imu["angular_velocity"], imu["orientation"])

Tracking state: SLAM tracking status per timestamp:

for ts, state in session.tracking_states():
    ...

Synced frames

Almost every pipeline wants RGB + depth + pose + IMU paired. session.frames() does the matching for you:

for frame in session.frames():
    frame.rgb            # (H, W, 3) uint8
    frame.depth          # (H, W) uint16 mm or None
    frame.camera_pose    # Pose6D or None
    frame.imu            # dict or None
    frame.depth_K        # (3, 3) intrinsics
    frame.rgb_K          # (3, 3) intrinsics
    frame.timestamp      # seconds (RGB clock)
    frame.index          # 0-based

The matching uses nearest-neighbour timestamp with a max-delta cutoff. Tighten or loosen with kwargs:

for frame in session.frames(max_depth_dt=0.03, max_pose_dt=0.05):
    ...

See Synced frames for how the sync algorithm works.

Bulk accessors

When you want all of a stream at once (e.g. building a trajectory plot):

poses = session.all_camera_poses()    # list[(ts, Pose6D)]
imu   = session.all_imu_samples()     # list[(ts, dict)]
tfs   = session.tf_transforms()       # list[(ts, parent, child, Pose6D)]
traj  = session.trajectory()          # list[(ts, Pose6D)] from /trajectory topic

Each is cached after the first call: calling session.all_camera_poses() twice doesn't re-decode.

Intrinsics

rgb_intr   = session.rgb_intrinsics    # CameraIntrinsics or None
depth_intr = session.depth_intrinsics

print(rgb_intr.width, rgb_intr.height)
print(rgb_intr.K)                       # (3, 3)
print(rgb_intr.D)                       # distortion coefficients
print(rgb_intr.distortion_model)        # "plumb_bob"

If your MCAP has no separate depth camera info topic, depth_intrinsics falls back to rgb_intrinsics scaled to the depth image resolution.

The optical-frame to link-frame rotation is read from /tf once and cached on the session:

R = session.R_optical_to_link    # (3, 3)

When the recording has no /tf messages, the SDK falls back to R_OPTICAL_TO_LINK (the identity-like rotation for the standard rig orientation). See Coordinate frames.

Map geometry

Two ways to get a 3D map out of the recording:

# Triangle mesh from /map/mesh
verts, faces, colors = session.mesh()
# (None if no /map/mesh topic)

# Point cloud (auto: /map/mesh_cloud, fallback /map/point_cloud)
xyz, rgb = session.point_cloud(source="auto")

Or build a dense colored cloud from depth frames yourself:

xyz, rgb = session.dense_point_cloud(
    every_n=10,                # use every 10th frame
    voxel_size=0.02,           # 2 cm voxel grid
    cam_exclude_radius=1.0,    # drop points within 1 m of the camera
)

See Map geometry for the full menu.

Buffering during the loop

The session has two buffers that downstream session.export(...) consumes:

for frame in session.frames():
    blurred = blur.blur(frame)
    hands   = tracker.detect_hands(frame)

    session.add_rgb_frame(frame.index, blurred)   # → rgb.mp4
    session.add_hand_pose(frame.index, hands)     # → annotation.hdf5

add_rgb_frame lazily opens an internal H.264 writer the first time it's called. add_hand_pose accumulates per-frame HandPose lists keyed by frame index. Neither is mandatory: skip them if you don't need that output.

add_rgb_frame lets you write post-processed frames (face-blurred, annotated overlays) into the episode video without rolling your own ffmpeg pipeline. The writer is sequential: frames must be added in iteration order.

Common patterns

Limit a run for testing

for i, frame in enumerate(session.frames()):
    if i >= 500:
        break
    ...

Skip a stream gracefully

for frame in session.frames():
    if frame.depth is None or frame.camera_pose is None:
        continue
    ...

Custom topic names

If your rig uses non-default topic names, override at construction:

from stera.data.mcap import TopicConfig

topics = TopicConfig(
    rgb="/myrig/rgb",
    depth="/myrig/depth",
    camera_pose="/myrig/pose",
)
session = MCAPReader("recording.mcap", topics=topics)

See also