From Tokens to Embodied Minds · Drill cards · Chapter 35
Drills
Safety and alignment for embodied AI
10 atomic recall cards. Export to Anki and let spaced repetition do its slow work.
In Anki: File → Import, choose this TSV, set field separator to Tab, deck = Tokens to Embodied Minds · Ch 35, note type = Basic.
| Front | Back |
|---|---|
| What is physical irreversibility and why does it matter for robot safety? | Unlike text outputs (retryable at zero cost), robot actions have permanent physical consequences — a broken object, a bruised human, a damaged joint. Safety filters must be deterministic and hardware-enforced; the policy is too slow (5-30Hz) and too manipulable to serve as the safety mechanism. |
| What is visual prompt injection? | An adversarial attack where text or patterns embedded in an image cause a VLM backbone to output a different instruction than intended. Applied to embodied VLAs: a sticker with adversarial text on a household object can cause the robot to execute an unsafe or unintended action. |
| What is the primary reference for robot AI safety threat taxonomy? | Concrete Problems in AI Safety, Amodei et al., arXiv:1606.06565, June 21, 2016. Still the cleanest taxonomy: reward hacking, side effects, safe exploration, distributional shift, scalable oversight. |
| At what frequency must the controller-level safety filter run to be effective? | 1kHz — the inner control loop frequency. The VLA runs at 5-30Hz; hardware events (joint limit approach, unexpected contact) can occur faster. The safety filter must run faster than the control loop it is protecting. |
| What is the shutdown problem in embodied AI? | Ensuring a robot can be safely shut down by any human present, without the policy resisting the shutdown. Still open research in 2026. Engineering response: physical interrupt button at the power supply level, bypassing all software. |
| What is a contact force ceiling and how is it enforced? | A maximum allowed contact force (typically 20-40N for household manipulation) measured by a wrist force-torque sensor. If exceeded, an emergency stop is triggered at the controller level — independently of the policy. |
| Why must joint limits be enforced in servo firmware rather than in the Python policy? | The policy runs at 5-30Hz and can be adversarially manipulated. Firmware runs at 1kHz+ and cannot be overridden by policy output. Hardware-level enforcement is the only guarantee. |
| What is the second threat surface of embodied AI (partial observability of human intent)? | A home robot operating in a shared space cannot observe human intentions — only position and motion. A human reaching toward the same space the robot is moving into is a collision hazard. The 2026 engineering response: restricted operating envelopes, slow speeds near humans, mandatory confirmation gates before any action that moves toward a human. |
| What is the visual prompt injection reference paper? | Visual Adversarial Examples Jailbreak Aligned Large Language Models, Qi et al., arXiv:2306.13213, June 22, 2023. |
| Is the safety layer in the JHU capstone optional? | No. It is non-negotiable engineering. The capstone evaluation includes a test suite of 20 deliberately unsafe commands that must all be caught by the safety stack before any public demo. |