Safety and alignment for embodied AI

10 atomic recall cards. Export to Anki and let spaced repetition do its slow work.

In Anki: File → Import, choose this TSV, set field separator to Tab, deck = Tokens to Embodied Minds · Ch 35, note type = Basic.

Front	Back
What is physical irreversibility and why does it matter for robot safety?	Unlike text outputs (retryable at zero cost), robot actions have permanent physical consequences — a broken object, a bruised human, a damaged joint. Safety filters must be deterministic and hardware-enforced; the policy is too slow (5-30Hz) and too manipulable to serve as the safety mechanism.
What is visual prompt injection?	An adversarial attack where text or patterns embedded in an image cause a VLM backbone to output a different instruction than intended. Applied to embodied VLAs: a sticker with adversarial text on a household object can cause the robot to execute an unsafe or unintended action.
What is the primary reference for robot AI safety threat taxonomy?	Concrete Problems in AI Safety, Amodei et al., arXiv:1606.06565, June 21, 2016. Still the cleanest taxonomy: reward hacking, side effects, safe exploration, distributional shift, scalable oversight.
At what frequency must the controller-level safety filter run to be effective?	1kHz — the inner control loop frequency. The VLA runs at 5-30Hz; hardware events (joint limit approach, unexpected contact) can occur faster. The safety filter must run faster than the control loop it is protecting.
What is the shutdown problem in embodied AI?	Ensuring a robot can be safely shut down by any human present, without the policy resisting the shutdown. Still open research in 2026. Engineering response: physical interrupt button at the power supply level, bypassing all software.
What is a contact force ceiling and how is it enforced?	A maximum allowed contact force (typically 20-40N for household manipulation) measured by a wrist force-torque sensor. If exceeded, an emergency stop is triggered at the controller level — independently of the policy.
Why must joint limits be enforced in servo firmware rather than in the Python policy?	The policy runs at 5-30Hz and can be adversarially manipulated. Firmware runs at 1kHz+ and cannot be overridden by policy output. Hardware-level enforcement is the only guarantee.
What is the second threat surface of embodied AI (partial observability of human intent)?	A home robot operating in a shared space cannot observe human intentions — only position and motion. A human reaching toward the same space the robot is moving into is a collision hazard. The 2026 engineering response: restricted operating envelopes, slow speeds near humans, mandatory confirmation gates before any action that moves toward a human.
What is the visual prompt injection reference paper?	Visual Adversarial Examples Jailbreak Aligned Large Language Models, Qi et al., arXiv:2306.13213, June 22, 2023.
Is the safety layer in the JHU capstone optional?	No. It is non-negotiable engineering. The capstone evaluation includes a test suite of 20 deliberately unsafe commands that must all be caught by the safety stack before any public demo.