From Sand to Superintelligence  ·  Drill cards · Chapter 30
Drills

A Thought, Token by Token

10 atomic recall cards. Export to Anki and let spaced repetition do its slow work.

10 cards due for review

In Anki: File → Import, choose this TSV, set field separator to Tab, deck = Sand to Silicon · Ch 30, note type = Basic.

FrontBack
What is a 'token' in the context of a frontier LLM?A vocabulary unit — a subword piece — produced by a learned tokenizer like tiktoken or SentencePiece. Encoded as an integer in the range 0 to ~vocab_size.
Roughly how many multiplications per generated token on a frontier model?~10^11 — a few hundred billion. A full response of a few thousand tokens climbs into the quadrillions (~10^15).
Roughly how much wall-clock time per token, on a frontier system?~50 ms.
Roughly how much energy per generated token?~1 joule.
What is the embedding table's shape?(vocab_size × hidden_dim), e.g. ~100,000 × ~16,000.
What three tensors does attention compute from the input?Queries (Q), keys (K), and values (V), each via a linear projection of the input.
What does the QK^T matrix represent, post-softmax?A weighted attention pattern: how much each token should attend to every other token.
What does the unembedding step produce?Logits — one real number per vocabulary entry — which softmax converts into a probability distribution.
Why does KV caching speed up the autoregressive loop?Previous tokens' K and V tensors are cached, so only the new token requires a fresh forward pass instead of recomputing the entire prefix.
Name 4 sampling rules used at the softmax-to-token step.Argmax (greedy), top-k, top-p (nucleus), temperature-scaled sampling.