Output formats

What Stera reads in (MCAP) and what it writes out (HDF5, MP4, PLY, RRD).

A Stera session moves through two formats: MCAP going in, an episode directory coming out.

Input — MCAP

Each recording from the Stera App is a single .mcap file. It bundles every sensor stream — RGB, depth, ARKit pose, IMU — on a shared clock.

Process reads MCAPs through MCAPReader:

from stera.data import MCAPReader

reader = MCAPReader("recording.mcap")
for frame in reader.frames():
    rgb, depth, pose = frame.rgb, frame.depth, frame.cam_pose

Full reference: Process > Guides > Reading MCAP.

Output — Episode directory

After processing, session.export(out_dir) writes a complete episode you can drop straight into a training pipeline.

FileFormatWhat it is
rgb.mp4H.264 MP4Original RGB video at capture framerate.
mesh.plyPLYScene mesh reconstructed from depth + ARKit.
thumbnail.jpgJPEGOne-frame preview, useful for dataset browsers.
annotation.hdf5HDF5All time-series: depth, cam-pose, hand-pose, IMU, metadata.
visualization.rrdRerunOptional replay file, opens in rerun-viewer.
calibrations/.npy + meta.jsonIntrinsics, distortion, RGB↔depth extrinsics.

annotation.hdf5

The HDF5 file is the most important output for downstream training. It holds every time-series annotation behind one file handle:

annotation.hdf5
├── /depth           per-RGB-frame depth maps, gzip-compressed
├── /cam-pose        camera pose translations + rotations
├── /imu             IMU samples
├── /hand-pose       hand detections (when buffered)
└── /metadata        durations, frame counts, start/end timestamps

Every dataset's shape, dtype, and units is documented in the canonical reference: Process > Concepts > HDF5 schema.

Coordinate frames

All poses in the export use a single convention — right-handed, +X right / +Y down / +Z forward in the camera frame, with depth in millimetres. See Process > Concepts > Coordinate frames before you start training.

Don't mix raw MCAP poses with exported HDF5 poses without checking the frame convention — MCAP carries the ARKit native frame, while session.export rebases to the Stera frame.

Choosing what to export

session.export() writes the full episode by default. To skip outputs, pass keyword flags:

session.export(
    "episodes/run_01",
    write_mesh=False,         # skip mesh.ply
    write_rrd=False,          # skip visualization.rrd
    annotations=["depth", "cam-pose"],   # subset of HDF5 groups
)

Full option list: Process > Guides > Episode export.