Output formats
What Stera reads in (MCAP) and what it writes out (HDF5, MP4, PLY, RRD).
A Stera session moves through two formats: MCAP going in, an episode directory coming out.
Input — MCAP
Each recording from the Stera App is a single .mcap file. It bundles every sensor stream — RGB, depth, ARKit pose, IMU — on a shared clock.
Process reads MCAPs through MCAPReader:
from stera.data import MCAPReader
reader = MCAPReader("recording.mcap")
for frame in reader.frames():
rgb, depth, pose = frame.rgb, frame.depth, frame.cam_poseFull reference: Process > Guides > Reading MCAP.
Output — Episode directory
After processing, session.export(out_dir) writes a complete episode you can drop straight into a training pipeline.
| File | Format | What it is |
|---|---|---|
rgb.mp4 | H.264 MP4 | Original RGB video at capture framerate. |
mesh.ply | PLY | Scene mesh reconstructed from depth + ARKit. |
thumbnail.jpg | JPEG | One-frame preview, useful for dataset browsers. |
annotation.hdf5 | HDF5 | All time-series: depth, cam-pose, hand-pose, IMU, metadata. |
visualization.rrd | Rerun | Optional replay file, opens in rerun-viewer. |
calibrations/ | .npy + meta.json | Intrinsics, distortion, RGB↔depth extrinsics. |
annotation.hdf5
The HDF5 file is the most important output for downstream training. It holds every time-series annotation behind one file handle:
annotation.hdf5
├── /depth per-RGB-frame depth maps, gzip-compressed
├── /cam-pose camera pose translations + rotations
├── /imu IMU samples
├── /hand-pose hand detections (when buffered)
└── /metadata durations, frame counts, start/end timestampsEvery dataset's shape, dtype, and units is documented in the canonical reference: Process > Concepts > HDF5 schema.
Coordinate frames
All poses in the export use a single convention — right-handed, +X right / +Y down / +Z forward in the camera frame, with depth in millimetres. See Process > Concepts > Coordinate frames before you start training.
Don't mix raw MCAP poses with exported HDF5 poses without checking the frame convention — MCAP carries the ARKit native frame, while session.export rebases to the Stera frame.
Choosing what to export
session.export() writes the full episode by default. To skip outputs, pass keyword flags:
session.export(
"episodes/run_01",
write_mesh=False, # skip mesh.ply
write_rrd=False, # skip visualization.rrd
annotations=["depth", "cam-pose"], # subset of HDF5 groups
)Full option list: Process > Guides > Episode export.