LeRobot (Hugging Face) is to robotics what PyTorch is to deep learning: a standardized stack that makes it possible for one person to go from zero to a fine-tuned VLA policy on real hardware in a weekend. The components: standardized datasets on HuggingFace Hub (every dataset a versioned, queryable, parquet-backed object), a unified policy API (ACT, Diffusion Policy, π0, SmolVLA, and OpenVLA all interchangeable via the same CLI), and the SO-100/SO-101 arm (fully open hardware spec, $110-150 bill of materials, servo-driven, six degrees of freedom). The lerobot-record and lerobot-train commands are the production loop. This is the open-source counterweight to the closed industrial stacks (Isaac Lab for simulation, GR00T for the policy) — and the fastest path for one person to collect real manipulation data, train a VLA, and deploy it on hardware they built. For the JHU humanoid capstone, LeRobot is the data infrastructure regardless of which policy you use.
The LeRobot dataset format
A LeRobot dataset is a HuggingFace Hub repository containing: one parquet file per episode (observation frames, action sequences, task metadata), a dataset card with hardware and task description, and a standardized schema that specifies observation keys (observation.images.top, observation.images.wrist, observation.state) and action keys (action as joint positions or end-effector deltas). The schema is flexible enough to accommodate different hardware (SO-101, WidowX, Franka, custom arms) while enforcing enough structure that datasets from different sources can be used together.
SmolVLA's community pretraining is the proof: Hugging Face aggregated hundreds of contributor datasets from HF Hub, all in the LeRobot format, into a single pretraining corpus. A researcher in Tokyo who uploads a 30-episode SO-101 plate-stacking dataset contributes to SmolVLA's pretrained capabilities globally. The versioning (every dataset is immutable once uploaded, with semantic versioning for updates) makes this reproducible — you can always trace which exact dataset version was used in any training run.
The unified policy API
LeRobot's policy API makes ACT, Diffusion Policy, SmolVLA, and OpenVLA interchangeable: lerobot-train --policy smolvla vs lerobot-train --policy act vs lerobot-train --policy diffusion. The same dataset, the same evaluation loop, the same CLI. This is significant for the capstone: you can run a principled ablation over policies on identical data — 50 episodes, 10 evaluation trials — without writing any training code. The comparison that matters for the capstone decision is ACT (deterministic transformer, fast inference) vs Diffusion Policy (multimodal, slower inference) vs SmolVLA (VLM-conditioned, language-instructable). SmolVLA will typically win on tasks that require language instructions or generalization to new objects; ACT will often win on precision motor tasks with consistent demonstrations.
The SO-100/SO-101 arm hardware: six servo-driven joints, open Fusion 360 CAD files, 3D-printable components, $110-150 bill of materials. The teleoperation controller (a leader arm or a SpaceMouse) connects via USB; lerobot-record handles the data collection, time-synchronization, and HF Hub upload. Building one takes 8-12 hours; buying a kit is available from several vendors in the LeRobot community.
Data discipline — the underrated part
The most important thing LeRobot teaches is data discipline. The temptation when recording manipulation demos is to collect 'enough' episodes without measuring quality. LeRobot's dataset card forces you to document: hardware, task description, success definition, failure modes, and per-episode success labels. The per-episode success labels are critical — training a policy on failed demonstrations (where the human recovered from a mistake) is a common source of policy confusion. The lerobot-record CLI includes a real-time visualization that lets you label each episode before uploading.
For the JHU humanoid capstone, the LeRobot dataset of 200 SO-101 manipulation episodes is one of the public output artifacts. It should be uploaded to HF Hub under your username, documented with a detailed dataset card, and verified against the LeRobot schema validator before use in training. This dataset is both a training artifact and a public portfolio piece — the data quality discipline is visible to anyone who downloads and examines it.
The SO-101 is a tabletop arm, not a humanoid. For the full JHU capstone (mobile humanoid), you will need to adapt the LeRobot data collection pipeline to your specific hardware. The dataset format and training CLI transfer directly; the hardware interface requires writing a custom LeRobot robot class.