
Capture, process, and export multimodal data on hardware you already own.
Ten million frames of in-the-wild egocentric activity. 200 hours. 354 sessions. 108 minutes longest continuous capture. Every frame annotated with depth, 6-DoF pose, MANO hands, and an atomic-to-session-scale instruction tree. CC-BY-NC, on Hugging Face.
Start your own portable data lab with an iPhone Pro and the Stera App. ARKit fuses RGB, depth, IMU, and 6-DOF tracking entirely on-device, letting you capture ego, exo, third-person, or robot manipulation data anywhere Just mount, Record, and Go.
One pipeline turns every session into the modalities that Physical AI needs. RGB-D. 6-DoF poses. 21 MANO articulations, per-hand. IMU. Upper Body co-ordinates. 3D mesh for real-to-sim. Hierarchical textual instruction trees, atomic to abstract. No human in the loop.
Stera has abstracted away the infrastructure that allows us to collect high fidelity multimodal bespoke data for world model research.
We dropped Stera into our lab pipeline and were collecting research-grade ego data within a week. The dataset alone saved us months of sensor rigging.
datasets library.