From Tokens to Embodied Minds  ·  Drill cards · Chapter 36
Drills

Capstone — a humanoid home assistant, end to end

10 atomic recall cards. Export to Anki and let spaced repetition do its slow work.

10 cards due for review

In Anki: File → Import, choose this TSV, set field separator to Tab, deck = Tokens to Embodied Minds · Ch 36, note type = Basic.

FrontBack
Name the six subsystems of the JHU humanoid capstone.1. Perception (DINOv2 + SAM 2 + 3DGS). 2. Planning (LangGraph + MCP tools). 3. Policy (SmolVLA or GR00T N1.5). 4. Controller (IK + PD/impedance, 1kHz). 5. Simulation (Isaac Lab + Newton). 6. Safety (5-layer filter stack) + Observability (rerun.io + Langfuse).
What is the interface format between the policy and the controller?Joint-position targets in radians, timestamped at 30Hz. The trajectory interpolator converts the K-step action chunk to per-timestep targets; the IK solver converts end-effector targets to joint angles for the PD controller.
What is the 50-task regression suite and why is it the honest measure of capstone completion?A predefined set of 50 canonical tasks (20 tabletop manipulation, 15 navigation+manipulation, 10 multi-step sequences, 5 safety-filter tests) with pass/fail criteria for each. It is honest because it prevents cherry-picking — a demo can show the 2 tasks that always work; the regression suite tests the full distribution.
What observability tools does the capstone use?rerun.io for real-time trajectory replay and debugging (joint positions, end-effector poses, camera frames, policy actions). Langfuse for planner LLM traces (prompt, response, latency, token cost). Both run offline as well for post-hoc analysis.
Which published reference architecture is closest to the JHU capstone?1X Technologies NEO Gamma deployment of GR00T N1 (NVIDIA-1X partnership, March 18, 2025). Full stack: Isaac Lab sim + GR00T policy + on-robot deployment.
What is the minimum LeRobot episode count for a meaningful fine-tuned manipulation policy?200 episodes — the capstone target. Below 50, most policies overfit; 50-100 is sufficient for simple pick-and-place; 200 covers the diversity of household manipulation variants.
What does the LangGraph confirmation state do?It blocks execution of any irreversible physical action until the user confirms (voice, button, or app). This is the human-in-the-loop gate required by the Chapter 35 safety layer.
How does the 3DGS scene map get updated online during robot operation?New wrist-camera frames are used to add or refine Gaussian primitives (gsplat online densification). SAM 2 re-segments any new objects that enter the field of view. The result is a continuously updated labeled 3D scene map.
What is the role of Isaac Lab in the capstone beyond policy training?1. Development: 47 of the 50 canonical regression tasks can be evaluated in simulation before touching real hardware. 2. Data augmentation: GR00T-Dreams generates synthetic training data from the 200-episode LeRobot dataset to extend policy coverage.
What MCP tools does the LangGraph planner use?Home-control tools exposed via MCP (Ch 21): get_object_position (from perception map), find_object_by_type, home automation controls (lights, calendar, recipe DB). Robot action primitives (pick_up, place_on, open_drawer) are MCP tool calls that resolve to SmolVLA/GR00T policy invocations with concrete object coordinates.