From Tokens to Embodied Minds  ·  Drill cards · Chapter 34
Drills

Hugging Face LeRobot and the open robotics stack

10 atomic recall cards. Export to Anki and let spaced repetition do its slow work.

10 cards due for review

In Anki: File → Import, choose this TSV, set field separator to Tab, deck = Tokens to Embodied Minds · Ch 34, note type = Basic.

FrontBack
What is the SO-101 arm and what is its bill of materials cost?A six-DOF servo-driven tabletop arm with fully open hardware spec (Fusion 360 CAD, 3D-printable). Bill of materials: $110-150. The standard data collection hardware for LeRobot.
What CLI commands form the LeRobot production loop?lerobot-record (data collection, time-sync, HF Hub upload) and lerobot-train (policy training: ACT, Diffusion Policy, SmolVLA, OpenVLA via --policy flag).
What storage format does LeRobot use for datasets?Parquet files on HuggingFace Hub, one per episode, with a standardized schema. Versioned, queryable, and immutable once uploaded.
What three policies can be trained interchangeably with lerobot-train?ACT (Action Chunking with Transformers), Diffusion Policy, SmolVLA — and OpenVLA, π0, and π0.5 via compatible wrappers. Same CLI, same dataset, same eval loop.
What is ACT?Action Chunking with Transformers (Zhao et al., 2023): a deterministic transformer policy that predicts a chunk of K future actions at once. Fast inference, good for precision motor tasks with consistent demonstrations.
Why is per-episode success labeling important during lerobot-record?Training on failed demonstrations (where the human corrected a mistake mid-episode) is a common source of policy confusion — the policy learns inconsistent behaviors. Success labels let you filter or weight episodes before training.
How does the LeRobot dataset format enable SmolVLA community pretraining?Standardized schema + HF Hub versioning allows Hugging Face to aggregate hundreds of contributor datasets into a single pretraining corpus. Any conformant upload contributes to SmolVLA's pretrained capabilities globally.
What is the public output artifact for the JHU capstone's LeRobot work?A 200-episode SO-101 manipulation dataset uploaded to HF Hub under your username, documented with a detailed dataset card, verifiable against the LeRobot schema. Both a training artifact and a public portfolio piece.
On the same 50-episode dataset, which policy typically wins on precision motor tasks vs language-instructed generalization tasks?ACT typically wins on precision motor tasks with consistent demonstrations (lower variance, fast inference). SmolVLA wins on language-instructed tasks and generalization to new objects (VLM backbone provides semantic grounding).
What is the primary LeRobot documentation reference?LeRobot documentation, Hugging Face, current. github.com/huggingface/lerobot and huggingface.co/docs/lerobot.