Reading an MCAP
Open a Stera recording with MCAPReader, iterate raw streams or synced frames, access intrinsics, IMU, and TF transforms.
Open a recording
from stera.data import MCAPReader
session = MCAPReader("recording.mcap")
print(session.duration) # seconds
print(session.num_rgb_frames)
print(session.num_depth_frames)
print(session.rgb_intrinsics.K) # (3, 3) numpy arrayMCAPReader is permissive by default: missing topics produce empty iterators or None fields, never errors. Pass check_format=True to enforce the reference topic fingerprint:
session = MCAPReader("recording.mcap", check_format=True)
# Raises ValueError if any of /camera/rgb/compressed, /camera/depth,
# /camera/pose, /device/imu, /tf, /trajectory, etc. are missing.Two iteration modes
Raw per-stream iterators
When you only care about one stream, iterate it directly. Each method yields (timestamp_seconds, decoded_message).
RGB: (H, W, 3) uint8 frames:
for ts, rgb in session.rgb_frames():
...Depth: (H, W) uint16 millimetres:
for ts, depth in session.depth_frames():
...Camera pose: Pose6D in the world frame:
for ts, pose in session.camera_poses():
...IMU: dict with linear_acceleration, angular_velocity, orientation:
for ts, imu in session.imu_samples():
print(imu["linear_acceleration"], imu["angular_velocity"], imu["orientation"])Tracking state: SLAM tracking status per timestamp:
for ts, state in session.tracking_states():
...Synced frames
Almost every pipeline wants RGB + depth + pose + IMU paired. session.frames() does the matching for you:
for frame in session.frames():
frame.rgb # (H, W, 3) uint8
frame.depth # (H, W) uint16 mm or None
frame.camera_pose # Pose6D or None
frame.imu # dict or None
frame.depth_K # (3, 3) intrinsics
frame.rgb_K # (3, 3) intrinsics
frame.timestamp # seconds (RGB clock)
frame.index # 0-basedThe matching uses nearest-neighbour timestamp with a max-delta cutoff. Tighten or loosen with kwargs:
for frame in session.frames(max_depth_dt=0.03, max_pose_dt=0.05):
...See Synced frames for how the sync algorithm works.
Bulk accessors
When you want all of a stream at once (e.g. building a trajectory plot):
poses = session.all_camera_poses() # list[(ts, Pose6D)]
imu = session.all_imu_samples() # list[(ts, dict)]
tfs = session.tf_transforms() # list[(ts, parent, child, Pose6D)]
traj = session.trajectory() # list[(ts, Pose6D)] from /trajectory topicEach is cached after the first call: calling session.all_camera_poses() twice doesn't re-decode.
Intrinsics
rgb_intr = session.rgb_intrinsics # CameraIntrinsics or None
depth_intr = session.depth_intrinsics
print(rgb_intr.width, rgb_intr.height)
print(rgb_intr.K) # (3, 3)
print(rgb_intr.D) # distortion coefficients
print(rgb_intr.distortion_model) # "plumb_bob"If your MCAP has no separate depth camera info topic, depth_intrinsics falls back to rgb_intrinsics scaled to the depth image resolution.
TF and the optical-to-link rotation
The optical-frame to link-frame rotation is read from /tf once and cached on the session:
R = session.R_optical_to_link # (3, 3)When the recording has no /tf messages, the SDK falls back to R_OPTICAL_TO_LINK (the identity-like rotation for the standard rig orientation). See Coordinate frames.
Map geometry
Two ways to get a 3D map out of the recording:
# Triangle mesh from /map/mesh
verts, faces, colors = session.mesh()
# (None if no /map/mesh topic)
# Point cloud (auto: /map/mesh_cloud, fallback /map/point_cloud)
xyz, rgb = session.point_cloud(source="auto")Or build a dense colored cloud from depth frames yourself:
xyz, rgb = session.dense_point_cloud(
every_n=10, # use every 10th frame
voxel_size=0.02, # 2 cm voxel grid
cam_exclude_radius=1.0, # drop points within 1 m of the camera
)See Map geometry for the full menu.
Buffering during the loop
The session has two buffers that downstream session.export(...) consumes:
for frame in session.frames():
blurred = blur.blur(frame)
hands = tracker.detect_hands(frame)
session.add_rgb_frame(frame.index, blurred) # → rgb.mp4
session.add_hand_pose(frame.index, hands) # → annotation.hdf5add_rgb_frame lazily opens an internal H.264 writer the first time it's called. add_hand_pose accumulates per-frame HandPose lists keyed by frame index. Neither is mandatory: skip them if you don't need that output.
add_rgb_frame lets you write post-processed frames (face-blurred,
annotated overlays) into the episode video without rolling your own
ffmpeg pipeline. The writer is sequential: frames must be added in
iteration order.
Common patterns
Limit a run for testing
for i, frame in enumerate(session.frames()):
if i >= 500:
break
...Skip a stream gracefully
for frame in session.frames():
if frame.depth is None or frame.camera_pose is None:
continue
...Custom topic names
If your rig uses non-default topic names, override at construction:
from stera.data.mcap import TopicConfig
topics = TopicConfig(
rgb="/myrig/rgb",
depth="/myrig/depth",
camera_pose="/myrig/pose",
)
session = MCAPReader("recording.mcap", topics=topics)See also
MCAPReaderAPI: full reference.- Synced frames: how sync works.
- Coordinate frames: optical / link / world.