From Tokens to Embodied Minds · Drill cards · Chapter 36
Drills
Capstone — a humanoid home assistant, end to end
10 atomic recall cards. Export to Anki and let spaced repetition do its slow work.
In Anki: File → Import, choose this TSV, set field separator to Tab, deck = Tokens to Embodied Minds · Ch 36, note type = Basic.
| Front | Back |
|---|---|
| Name the six subsystems of the JHU humanoid capstone. | 1. Perception (DINOv2 + SAM 2 + 3DGS). 2. Planning (LangGraph + MCP tools). 3. Policy (SmolVLA or GR00T N1.5). 4. Controller (IK + PD/impedance, 1kHz). 5. Simulation (Isaac Lab + Newton). 6. Safety (5-layer filter stack) + Observability (rerun.io + Langfuse). |
| What is the interface format between the policy and the controller? | Joint-position targets in radians, timestamped at 30Hz. The trajectory interpolator converts the K-step action chunk to per-timestep targets; the IK solver converts end-effector targets to joint angles for the PD controller. |
| What is the 50-task regression suite and why is it the honest measure of capstone completion? | A predefined set of 50 canonical tasks (20 tabletop manipulation, 15 navigation+manipulation, 10 multi-step sequences, 5 safety-filter tests) with pass/fail criteria for each. It is honest because it prevents cherry-picking — a demo can show the 2 tasks that always work; the regression suite tests the full distribution. |
| What observability tools does the capstone use? | rerun.io for real-time trajectory replay and debugging (joint positions, end-effector poses, camera frames, policy actions). Langfuse for planner LLM traces (prompt, response, latency, token cost). Both run offline as well for post-hoc analysis. |
| Which published reference architecture is closest to the JHU capstone? | 1X Technologies NEO Gamma deployment of GR00T N1 (NVIDIA-1X partnership, March 18, 2025). Full stack: Isaac Lab sim + GR00T policy + on-robot deployment. |
| What is the minimum LeRobot episode count for a meaningful fine-tuned manipulation policy? | 200 episodes — the capstone target. Below 50, most policies overfit; 50-100 is sufficient for simple pick-and-place; 200 covers the diversity of household manipulation variants. |
| What does the LangGraph confirmation state do? | It blocks execution of any irreversible physical action until the user confirms (voice, button, or app). This is the human-in-the-loop gate required by the Chapter 35 safety layer. |
| How does the 3DGS scene map get updated online during robot operation? | New wrist-camera frames are used to add or refine Gaussian primitives (gsplat online densification). SAM 2 re-segments any new objects that enter the field of view. The result is a continuously updated labeled 3D scene map. |
| What is the role of Isaac Lab in the capstone beyond policy training? | 1. Development: 47 of the 50 canonical regression tasks can be evaluated in simulation before touching real hardware. 2. Data augmentation: GR00T-Dreams generates synthetic training data from the 200-episode LeRobot dataset to extend policy coverage. |
| What MCP tools does the LangGraph planner use? | Home-control tools exposed via MCP (Ch 21): get_object_position (from perception map), find_object_by_type, home automation controls (lights, calendar, recipe DB). Robot action primitives (pick_up, place_on, open_drawer) are MCP tool calls that resolve to SmolVLA/GR00T policy invocations with concrete object coordinates. |