MobileEgo Anywhere: Open Infrastructure for Long Horizon Egocentric Data on Commodity Hardware

Abstract

We introduce a framework for collecting extended egocentric video sequences using standard smartphone hardware. Alongside the framework, we release 200 hours of long-form egocentric data with persistent tracking and open-source our video processing infrastructure, STERA. The contribution aims to democratize robotics data collection by enabling hour-plus egocentric trajectories using ubiquitous mobile devices, supporting Vision Language Action model development with standardized, training-ready data formats.

links

Contributions

STERA infrastructure

Open-source pipeline for processing long horizon egocentric video captured on commodity smartphones.

200h dataset

Long-form egocentric trajectories with persistent tracking, released for VLA model development.

Training-ready format

Standardized outputs designed to drop into existing robotics and VLA training pipelines.

Commodity hardware

Hour-plus capture sessions using devices people already carry — no specialized rig required.

Why it matters

Egocentric video data has historically been bottlenecked by specialized capture hardware and short session lengths. By targeting commodity smartphones and hour-plus horizons, this work lowers the barrier to building large, diverse datasets that match how people actually move through the world — a precondition for general-purpose Vision Language Action models.