What is the minimum transatlantic TCP round-trip time cited in the chapter?	~80 ms, a floor set by the speed of light in fibre.
What per-token generation latency does the chapter use in its budget example?	~50 ms per token.
What is first-token latency from a major frontier model, per the chapter's stats?	~30–100 ms.
What HBM bandwidth does the chapter attribute to a Rubin GPU?	Roughly 22 TB/s of HBM4 bandwidth.
What classic usability finding does the chapter invoke, and what is its threshold?	Nielsen's 1993 response-time work: ten seconds is the limit at which the user's attention starts wandering off.
What is the realistic latency floor for an agent that wants to take more than a couple of actions?	Thirty seconds, per the chapter.
What three standard speculative/parallel techniques does the chapter name for reducing perceived latency?	Speculative decoding, parallel tool calls, and streaming with incremental rendering.
What industry analogy does the chapter use for the likely long-run shape of model geography?	Content delivery networks from the 2000s: heavy models in a few large data centers, smaller distilled models in regional points of presence close to users and data.
What does the chapter say is the 'real currency' of agentic systems?	The latency budget.
