Hands clasping. Plate 533, Animal Locomotion (Muybridge, 1887).

Stera is open infrastructure forEmbodied AI

Capture, process, and export multimodal data on hardware you already own.

Dataset · Stera-10M

Stera-10M.Open today.

Ten million frames of in-the-wild egocentric activity. 200 hours. 354 sessions. 108 minutes longest continuous capture. Every frame annotated with depth, 6-DoF pose, MANO hands, and an atomic-to-session-scale instruction tree. CC-BY-NC, on Hugging Face.

Browse on HF →

Record

A data lab in your pocket.

Start your own portable data lab with an iPhone Pro and the Stera App. ARKit fuses RGB, depth, IMU, and 6-DOF tracking entirely on-device, letting you capture ego, exo, third-person, or robot manipulation data anywhere Just mount, Record, and Go.

Get the app →Capture guide

Stera Capture — recording settings sheet

Stera Capture — home, collect multimodal data

Stera Capture — library of uploaded sessions

Process

Raw frames in. Research signals out.

One pipeline turns every session into the modalities that Physical AI needs. RGB-D. 6-DoF poses. 21 MANO articulations, per-hand. IMU. Upper Body co-ordinates. 3D mesh for real-to-sim. Hierarchical textual instruction trees, atomic to abstract. No human in the loop.

Read the pipeline →

lossfunk

Stera has abstracted away the infrastructure that allows us to collect high fidelity multimodal bespoke data for world model research.

Paras Chopra

QUEEN'SUniversity

We dropped Stera into our lab pipeline and were collecting research-grade ego data within a week. The dataset alone saved us months of sensor rigging.

Dr. Aisha Rahman

FAQs

How is Stera different from Project Aria or EgoExo4D?

Aria and EgoExo4D both produce excellent egocentric data, but both require gated hardware access. Stera runs on a consumer iPhone Pro and ships the entire capture and processing stack as open code. The result: anyone can capture data tomorrow, in any environment, without lab approval. You get hour-plus continuous sessions with depth, 6-DoF pose, and MANO hands — comparable in fidelity to Aria, accessible to anyone.

Can I use Stera-10M to train a commercial model?

The dataset is released under CC-BY-NC 4.0, so direct use in commercial training requires a separate licence. The Stera SDK is MIT-licensed — you can use the capture stack and processing pipeline to record your own commercial-grade dataset. For commercial licensing of Stera-10M itself, contact us.

What hardware do I need to capture my own data?

An iPhone Pro (12 Pro or newer, with LiDAR) and the Stera Capture iOS app. Five capture modes are supported: ego (head-mounted), ego+exo (multi-iPhone synced), exocentric tripod, UMI-compatible gripper, and static observer. No external sensors or rigs required.

What format does the dataset come in?

Each session is delivered as a directory: one MP4 of RGB, one HDF5 with all per-frame annotations (depth, pose, MANO hands, IMU, hierarchical text labels), a PLY scene mesh, calibrations, and Rerun-sdk visualization recordings. The SDK exports to LeRobot, raw MCAP, and RRD for visualization.

Can I bring my own storage bucket?

Yes. The Stera Capture app and SDK both support custom storage backends. Configure your own S3, GCS, or Azure bucket in the app settings or via the SDK config, and all recordings and processed outputs route there instead of FPV's default storage. See the storage configuration guide for setup.

How big is the full Stera-10M download?

Approximately 3.85 GB compressed across 354 sessions. Sessions can be downloaded individually via the Hugging Face dataset viewer, or the full corpus streamed using the datasets library.

Will Stera support Android or other hardware?

Not at launch. Stera is iPhone-Pro-only today because we rely on ARKit's on-device sensor fusion and LiDAR. The SDK ingests any MCAP-formatted recording, so adding a new platform is well-scoped — Android support is on the roadmap. If you're building on Stera and need another platform, let us know.

What does "fully automated" actually mean for annotations?

Every annotation in Stera-10M is generated by the open Stera SDK — no manual labelling. Hand pose, camera trajectory, scene mesh, and action language all come from the automated pipeline. We score every recording against structural and consistency invariants before release; 87% of sessions pass on first run and the rest are corrected automatically. The full evaluation harness is in the SDK.

How do I stay up to date with Stera news?

Subscribe to our newsletter at the bottom of the page — one email a month with release notes, new datasets, and research updates. Follow @fpvlabs on X for shorter-form updates.