Quickstart
A 25-line pipeline that reads an MCAP, tracks hands, blurs faces, logs everything to Rerun, and exports a clean episode directory.
Pipeline
The pipeline below is the canonical shape of a script using sdk: build a session, build the models, then loop over synced frames and call session/visualizer hooks as you go.
import stera
from stera.data import MCAPReader
from stera.models import HandTracker, FaceBlurrer, UpperBodyEstimator
from stera.viz import Visualizer
stera.setup_logging() # show progress + stage logs
session = MCAPReader("recording.mcap")
hands = HandTracker(model="mediapipe")
blur = FaceBlurrer(model="mediapipe")
skeleton = UpperBodyEstimator(session=session)
visualizer = Visualizer(
session,
map_3d="both", # mesh + point cloud (PC hidden by default)
mesh_refine={"color_speed": 0.5}, # 0 = fast draft, 1 = full quality
)
for frame in session.frames():
frame_blurred = blur.blur(frame) # (H, W, 3) uint8 RGB
hand_poses = hands.detect_hands(frame)
body = skeleton.estimate(frame, hands=hand_poses)
session.add_rgb_frame(frame.index, frame_blurred) # → rgb.mp4
session.add_hand_pose(frame.index, hand_poses) # → annotation.hdf5
visualizer.log_frame(frame, hands=hand_poses, skeleton=body)
session.export("episodes/run_01", visualizer=visualizer)Run it with the SDK already installed:
python quickstart.pyYou'll see structured progress logs from each stage, MCAP open, model load, frame loop (with a tqdm bar), then per-stage export logs (Writing rgb.mp4, Writing /hand-pose, etc.) and a final saved/skipped manifest.
What the pipeline produces
The session.export(...) call writes a self-contained episode directory:
Open the visualization with:
rerun episodes/run_01/visualization.rrdFor the full file-by-file breakdown see Episode layout and HDF5 schema.
Walkthrough
1. Open the recording
session = MCAPReader("recording.mcap")MCAPReader is permissive by default, if a topic is missing in your MCAP (no depth, no poses, etc.), the corresponding fields just come back as None later. Pass check_format=True if you want strict validation against the reference topic fingerprint. See Reading MCAP.
2. Build models
hands = HandTracker(model="mediapipe")
blur = FaceBlurrer(model="mediapipe")Both wrappers take model="..." plus optional model_path= and config kwargs. We use mediapipe here because it has zero external setup (just the pip extra). To use the higher-accuracy backends, change one line:
hands = HandTracker(model="wilor", model_path="/opt/WiLoR")
hands = HandTracker(model="hamer", model_path="/opt/hamer")
blur = FaceBlurrer(model="egoblur", model_path="/opt/EgoBlur")Everything else in the loop stays the same. See Hand tracking and Face blurring.
3. Iterate synced frames
for frame in session.frames():
...session.frames() yields a SyncedFrame per RGB frame, with the nearest depth, camera pose, and IMU sample already attached. The RGB clock drives the loop; depth and pose are matched to each RGB timestamp via nearest-neighbour search with a configurable max delta.
4. Buffer + visualize
session.add_rgb_frame(frame.index, frame_blurred)
session.add_hand_pose(frame.index, hand_poses)
visualizer.log_frame(frame, hands=hand_poses, skeleton=body)add_rgb_frame lazily opens an internal H.264 writer the first time it's called; the resulting mp4 is finalized into the episode directory by session.export. add_hand_pose accumulates per-frame detections; they get written to annotation.hdf5:/hand-pose at export time.
visualizer.log_frame streams the same data to a .rrd file with overlays, frustums, the 3D map, and IMU traces. It's safe to skip if you don't want viz output.
5. Export
session.export("episodes/run_01", visualizer=visualizer)A single call assembles rgb.mp4, mesh.ply, thumbnail.jpg, annotation.hdf5, calibrations, and (when given) visualization.rrd into the target directory. Anything that can't be produced is logged as skipped in the final manifest. See Episode export.
The whole loop is optional, you can drop any of the add_* calls or
the visualizer if you don't need that output. The export call only writes
what's available.
Next
- Reading MCAP,
frames(),rgb_frames(), intrinsics, IMU. - Hand tracking, backend trade-offs, depth anchoring,
HandPoseschema. - Visualization, Rerun blueprint,
map_3dmodes, headless viewing. - Mesh refinement, cleaning, densifying, and texture-mapping the SLAM mesh.