Hugging Face LeRobot and the open robotics stack

10 atomic recall cards. Export to Anki and let spaced repetition do its slow work.

In Anki: File → Import, choose this TSV, set field separator to Tab, deck = Tokens to Embodied Minds · Ch 34, note type = Basic.

Front	Back
What is the SO-101 arm and what is its bill of materials cost?	A six-DOF servo-driven tabletop arm with fully open hardware spec (Fusion 360 CAD, 3D-printable). Bill of materials: $110-150. The standard data collection hardware for LeRobot.
What CLI commands form the LeRobot production loop?	lerobot-record (data collection, time-sync, HF Hub upload) and lerobot-train (policy training: ACT, Diffusion Policy, SmolVLA, OpenVLA via --policy flag).
What storage format does LeRobot use for datasets?	Parquet files on HuggingFace Hub, one per episode, with a standardized schema. Versioned, queryable, and immutable once uploaded.
What three policies can be trained interchangeably with lerobot-train?	ACT (Action Chunking with Transformers), Diffusion Policy, SmolVLA — and OpenVLA, π0, and π0.5 via compatible wrappers. Same CLI, same dataset, same eval loop.
What is ACT?	Action Chunking with Transformers (Zhao et al., 2023): a deterministic transformer policy that predicts a chunk of K future actions at once. Fast inference, good for precision motor tasks with consistent demonstrations.
Why is per-episode success labeling important during lerobot-record?	Training on failed demonstrations (where the human corrected a mistake mid-episode) is a common source of policy confusion — the policy learns inconsistent behaviors. Success labels let you filter or weight episodes before training.
How does the LeRobot dataset format enable SmolVLA community pretraining?	Standardized schema + HF Hub versioning allows Hugging Face to aggregate hundreds of contributor datasets into a single pretraining corpus. Any conformant upload contributes to SmolVLA's pretrained capabilities globally.
What is the public output artifact for the JHU capstone's LeRobot work?	A 200-episode SO-101 manipulation dataset uploaded to HF Hub under your username, documented with a detailed dataset card, verifiable against the LeRobot schema. Both a training artifact and a public portfolio piece.
On the same 50-episode dataset, which policy typically wins on precision motor tasks vs language-instructed generalization tasks?	ACT typically wins on precision motor tasks with consistent demonstrations (lower variance, fast inference). SmolVLA wins on language-instructed tasks and generalization to new objects (VLM backbone provides semantic grounding).
What is the primary LeRobot documentation reference?	LeRobot documentation, Hugging Face, current. github.com/huggingface/lerobot and huggingface.co/docs/lerobot.