Map & geometry

Read the SLAM mesh, point cloud, and dense depth-derived reconstruction; texture-map vertices from RGB frames.

The SDK gives you several ways to get a 3D map out of a Stera recording, depending on what your SLAM system published and how dense (and how clean) you want the result.

Need a cleaned-up, densified, properly textured mesh? Reach for MeshRefiner — it bundles cleanup, Loop subdivision, and a view-angle / depth / occlusion-aware colorizer into one class. The raw building blocks below are still useful when building custom pipelines.

SLAM mesh

mesh = session.mesh()         # → tuple or None
if mesh is not None:
    verts, faces, colors = mesh
    print(verts.shape)        # (N, 3) world-frame metres
    print(faces.shape)        # (M, 3) int triangle indices
    print(colors)             # (N, 3) uint8 RGB or None

Read from /map/mesh (one Marker message). Returns None if the topic is missing.

This is a SLAM-output triangle mesh, clean topology, world-frame coordinates, optional per-vertex colours. Used by Visualizer(map_3d="mesh") and written to episode_dir/mesh.ply by session.export.

SLAM point cloud

xyz, rgb = session.point_cloud(source="auto")
# auto: tries /map/mesh_cloud first, falls back to /map/point_cloud

Source	Topic	When to use
`"auto"`	`/map/mesh_cloud` → `/map/point_cloud`	Default; works for most rigs.
`"mesh_cloud"`	`/map/mesh_cloud`	Force the accumulated SLAM cloud.
`"point_cloud"`	`/map/point_cloud`	Force the raw cloud.

Returns (np.ndarray (N, 3) float32, np.ndarray (N, 3) uint8 \| None). Coordinates are world-frame metres; colours are pre-baked when the SLAM system provided them.

Dense reconstruction from depth

When the recording has no usable map topic, build one from the per-frame depth + camera poses:

xyz, rgb = session.dense_point_cloud(
    every_n=10,                # use every Nth depth frame
    max_pts_per_frame=5_000,   # subsample per frame
    cam_exclude_radius=1.0,    # drop points within this radius of camera (m)
    voxel_size=0.02,           # voxel-grid downsample (m); 0 = no downsample
    min_depth=0.3,
    max_depth=5.0,
)

Walks session.frames() with a tqdm progress bar, back-projects each depth pixel into world frame using the depth intrinsics + per-frame camera pose, samples the matching RGB pixel for colour, and finally voxel-downsamples the union.

A 10-minute recording at 15 fps with every_n=10 and voxel_size=0.02 typically yields ~0.5-2M points after downsampling.

Color a vertex set from RGB frames

Useful when your mesh has no per-vertex colours, or when you want to texture a custom mesh you built externally:

verts = ...                                     # (N, 3) world-frame metres
colors = session.color_mesh(verts, every_n=10)  # (N, 3) uint8 RGB

For each vertex, projects it into every Nth RGB frame using the camera pose + intrinsics, samples the colour, and averages across frames where the vertex was visible. Vertices never visible default to mid-grey.

This is what Visualizer(map_3d="auto") does internally when the SLAM mesh comes back uncoloured. For a more detailed colorizer with per-frame bilinear sampling, view-angle / depth weighting, and a depth-buffer occlusion test, see MeshRefiner.

Putting it all together

A common pipeline for downstream training: SLAM mesh + dense cloud as a richer scene representation:

verts, faces, colors = session.mesh()
if colors is None:
    colors = session.color_mesh(verts, every_n=10)

# Save as a coloured PLY (use any standard writer, trimesh, open3d, …)
import trimesh
trimesh.Trimesh(
    vertices=verts, faces=faces, vertex_colors=colors
).export("scene.ply")

Or skip the SLAM mesh entirely and use the dense cloud:

xyz, rgb = session.dense_point_cloud(every_n=5, voxel_size=0.01)

When the recording has no depth

dense_point_cloud returns empty arrays if there are no valid depth frames. session.mesh() and session.point_cloud() only need the SLAM topics, so they still work.

If you only have RGB + camera poses, you can still get a useful trajectory:

poses = session.all_camera_poses()
xyz   = np.array([p.translation for _, p in poses])

All map-geometry calls return data in the world frame. To bring per-frame depth into world coordinates yourself, use optical_to_world.