From Sand to Superintelligence · Drill cards · Chapter 33
Drills
Latency Is Cognition
9 atomic recall cards. Export to Anki and let spaced repetition do its slow work.
In Anki: File → Import, choose this TSV, set field separator to Tab, deck = Sand to Silicon · Ch 33, note type = Basic.
| Front | Back |
|---|---|
| What is the minimum transatlantic TCP round-trip time cited in the chapter? | ~80 ms, a floor set by the speed of light in fibre. |
| What per-token generation latency does the chapter use in its budget example? | ~50 ms per token. |
| What is first-token latency from a major frontier model, per the chapter's stats? | ~30–100 ms. |
| What HBM bandwidth does the chapter attribute to a Rubin GPU? | Roughly 22 TB/s of HBM4 bandwidth. |
| What classic usability finding does the chapter invoke, and what is its threshold? | Nielsen's 1993 response-time work: ten seconds is the limit at which the user's attention starts wandering off. |
| What is the realistic latency floor for an agent that wants to take more than a couple of actions? | Thirty seconds, per the chapter. |
| What three standard speculative/parallel techniques does the chapter name for reducing perceived latency? | Speculative decoding, parallel tool calls, and streaming with incremental rendering. |
| What industry analogy does the chapter use for the likely long-run shape of model geography? | Content delivery networks from the 2000s: heavy models in a few large data centers, smaller distilled models in regional points of presence close to users and data. |
| What does the chapter say is the 'real currency' of agentic systems? | The latency budget. |