Quickstart

A 25-line pipeline that reads an MCAP, tracks hands, blurs faces, logs everything to Rerun, and exports a clean episode directory.

Pipeline

The pipeline below is the canonical shape of a script using sdk: build a session, build the models, then loop over synced frames and call session/visualizer hooks as you go.

quickstart.py

import stera
from stera.data import MCAPReader
from stera.models import HandTracker, FaceBlurrer, UpperBodyEstimator
from stera.viz import Visualizer

stera.setup_logging()  # show progress + stage logs

session    = MCAPReader("recording.mcap")
hands      = HandTracker(model="mediapipe")
blur       = FaceBlurrer(model="mediapipe")
skeleton   = UpperBodyEstimator(session=session)
visualizer = Visualizer(
    session,
    map_3d="both",                          # mesh + point cloud (PC hidden by default)
    mesh_refine={"color_speed": 0.5},       # 0 = fast draft, 1 = full quality
)

for frame in session.frames():
    frame_blurred = blur.blur(frame)        # (H, W, 3) uint8 RGB
    hand_poses    = hands.detect_hands(frame)
    body          = skeleton.estimate(frame, hands=hand_poses)

    session.add_rgb_frame(frame.index, frame_blurred)   # → rgb.mp4
    session.add_hand_pose(frame.index, hand_poses)      # → annotation.hdf5
    visualizer.log_frame(frame, hands=hand_poses, skeleton=body)

session.export("episodes/run_01", visualizer=visualizer)

Run it with the SDK already installed:

python quickstart.py

You'll see structured progress logs from each stage, MCAP open, model load, frame loop (with a tqdm bar), then per-stage export logs (Writing rgb.mp4, Writing /hand-pose, etc.) and a final saved/skipped manifest.

What the pipeline produces

The session.export(...) call writes a self-contained episode directory:

rgb.mp4

mesh.ply

thumbnail.jpg

visualization.rrd

annotation.hdf5

rgb_K.npy

rgb_D.npy

depth_K.npy

depth_D.npy

R_optical_to_link.npy

meta.json

Open the visualization with:

rerun episodes/run_01/visualization.rrd

For the full file-by-file breakdown see Episode layout and HDF5 schema.

Walkthrough

1. Open the recording

session = MCAPReader("recording.mcap")

MCAPReader is permissive by default, if a topic is missing in your MCAP (no depth, no poses, etc.), the corresponding fields just come back as None later. Pass check_format=True if you want strict validation against the reference topic fingerprint. See Reading MCAP.

2. Build models

hands = HandTracker(model="mediapipe")
blur  = FaceBlurrer(model="mediapipe")

Both wrappers take model="..." plus optional model_path= and config kwargs. We use mediapipe here because it has zero external setup (just the pip extra). To use the higher-accuracy backends, change one line:

hands = HandTracker(model="wilor",  model_path="/opt/WiLoR")
hands = HandTracker(model="hamer",  model_path="/opt/hamer")
blur  = FaceBlurrer(model="egoblur", model_path="/opt/EgoBlur")

Everything else in the loop stays the same. See Hand tracking and Face blurring.

3. Iterate synced frames

for frame in session.frames():
    ...

session.frames() yields a SyncedFrame per RGB frame, with the nearest depth, camera pose, and IMU sample already attached. The RGB clock drives the loop; depth and pose are matched to each RGB timestamp via nearest-neighbour search with a configurable max delta.

4. Buffer + visualize

session.add_rgb_frame(frame.index, frame_blurred)
session.add_hand_pose(frame.index, hand_poses)
visualizer.log_frame(frame, hands=hand_poses, skeleton=body)

add_rgb_frame lazily opens an internal H.264 writer the first time it's called; the resulting mp4 is finalized into the episode directory by session.export. add_hand_pose accumulates per-frame detections; they get written to annotation.hdf5:/hand-pose at export time.

visualizer.log_frame streams the same data to a .rrd file with overlays, frustums, the 3D map, and IMU traces. It's safe to skip if you don't want viz output.

5. Export

session.export("episodes/run_01", visualizer=visualizer)

A single call assembles rgb.mp4, mesh.ply, thumbnail.jpg, annotation.hdf5, calibrations, and (when given) visualization.rrd into the target directory. Anything that can't be produced is logged as skipped in the final manifest. See Episode export.

The whole loop is optional, you can drop any of the add_* calls or the visualizer if you don't need that output. The export call only writes what's available.

Reading MCAP, frames(), rgb_frames(), intrinsics, IMU.
Hand tracking, backend trade-offs, depth anchoring, HandPose schema.
Visualization, Rerun blueprint, map_3d modes, headless viewing.
Mesh refinement, cleaning, densifying, and texture-mapping the SLAM mesh.