From Tokens to Embodied Minds · Drill cards · Chapter 34
Drills
Hugging Face LeRobot and the open robotics stack
10 atomic recall cards. Export to Anki and let spaced repetition do its slow work.
In Anki: File → Import, choose this TSV, set field separator to Tab, deck = Tokens to Embodied Minds · Ch 34, note type = Basic.
| Front | Back |
|---|---|
| What is the SO-101 arm and what is its bill of materials cost? | A six-DOF servo-driven tabletop arm with fully open hardware spec (Fusion 360 CAD, 3D-printable). Bill of materials: $110-150. The standard data collection hardware for LeRobot. |
| What CLI commands form the LeRobot production loop? | lerobot-record (data collection, time-sync, HF Hub upload) and lerobot-train (policy training: ACT, Diffusion Policy, SmolVLA, OpenVLA via --policy flag). |
| What storage format does LeRobot use for datasets? | Parquet files on HuggingFace Hub, one per episode, with a standardized schema. Versioned, queryable, and immutable once uploaded. |
| What three policies can be trained interchangeably with lerobot-train? | ACT (Action Chunking with Transformers), Diffusion Policy, SmolVLA — and OpenVLA, π0, and π0.5 via compatible wrappers. Same CLI, same dataset, same eval loop. |
| What is ACT? | Action Chunking with Transformers (Zhao et al., 2023): a deterministic transformer policy that predicts a chunk of K future actions at once. Fast inference, good for precision motor tasks with consistent demonstrations. |
| Why is per-episode success labeling important during lerobot-record? | Training on failed demonstrations (where the human corrected a mistake mid-episode) is a common source of policy confusion — the policy learns inconsistent behaviors. Success labels let you filter or weight episodes before training. |
| How does the LeRobot dataset format enable SmolVLA community pretraining? | Standardized schema + HF Hub versioning allows Hugging Face to aggregate hundreds of contributor datasets into a single pretraining corpus. Any conformant upload contributes to SmolVLA's pretrained capabilities globally. |
| What is the public output artifact for the JHU capstone's LeRobot work? | A 200-episode SO-101 manipulation dataset uploaded to HF Hub under your username, documented with a detailed dataset card, verifiable against the LeRobot schema. Both a training artifact and a public portfolio piece. |
| On the same 50-episode dataset, which policy typically wins on precision motor tasks vs language-instructed generalization tasks? | ACT typically wins on precision motor tasks with consistent demonstrations (lower variance, fast inference). SmolVLA wins on language-instructed tasks and generalization to new objects (VLM backbone provides semantic grounding). |
| What is the primary LeRobot documentation reference? | LeRobot documentation, Hugging Face, current. github.com/huggingface/lerobot and huggingface.co/docs/lerobot. |