Glossary

103 terms used in the book — band gaps, MOSFETs, MMUs, tensor cores. Each entry links to the chapter where the idea first appears.

#

µop (micro-op)Ch 27: An internal sub-instruction a CPU's decoder produces from a machine-code instruction. Modern x86 cores execute these out-of-order.

A

A2ACh 36: Agent-to-Agent protocol — Google's open standard (2024) for letting one agent invoke another over the network.
AgentCh 34: A model placed in a loop with tools, memory, and a goal — capable of multi-step action without per-step human prompting.
ALUCh 20: Arithmetic Logic Unit. The part of a CPU that performs arithmetic and bitwise operations.
ASMLCh 8: The Dutch company that builds the world's only EUV lithography machines.
AssemblyCh 27: A human-readable form of machine code, with named registers and labels.
ASTCh 27: Abstract Syntax Tree — a tree representation of a program's structure produced by a parser.
AttentionCh 29: A neural-network operation that lets each token blend information from every other token via Q, K, V matrices.

B

Band gapCh 17: The energy distance between a material's valence and conduction bands. Silicon's is 1.12 eV.
BF16Ch 29: Brain Floating Point 16 — a 16-bit float format with FP32's exponent range and reduced mantissa, used in deep learning.
BIOSCh 25: Basic Input/Output System — the legacy firmware that boots a PC. Replaced by UEFI on modern systems.
BootloaderCh 25: A small program (e.g. GRUB) that loads an operating-system kernel into memory and jumps to it.
BytecodeCh 27: A compact, platform-independent intermediate representation of a program, executed by a virtual machine.

C

Cache lineCh 24: The smallest unit a CPU cache moves between memory levels — typically 64 bytes.
ChannelCh 18: The narrow conducting region under a MOSFET's gate, formed when the gate voltage is high enough to invert the surface.
ClockCh 21: A regular signal that synchronizes all the flip-flops on a chip. Modern CPUs run at ~3-5 GHz.
CMOSCh 18: Complementary Metal-Oxide-Semiconductor — the dominant transistor technology, pairing n-MOS and p-MOS for low static power.
CoalescingCh 28: On a GPU, the property of a memory access pattern that lets the hardware fold many threads' reads into a single transaction.
CompilerCh 27: A program that translates source code into a lower-level language, typically machine code via an IR.
Conduction bandCh 17: The band of energy levels in a solid in which electrons are free to move and conduct current.
Context switchCh 26: The kernel operation of saving one process's state and loading another's. Costs a few microseconds on modern hardware.
Context windowCh 32: The maximum number of tokens a model can attend to at once. The price of one full window is the unit cost of one 'thought'.
CoWoSCh 12: Chip-on-Wafer-on-Substrate — TSMC's 2.5D packaging that mounts GPU dies and HBM stacks on a silicon interposer.
CPUCh 22: Central Processing Unit. A general-purpose processor optimized for branchy, latency-sensitive single-thread work.
CUDACh 28: NVIDIA's parallel-programming model and runtime for GPUs.
CzochralskiCh 4: The process of growing a single silicon crystal by dipping a seed into a melt and slowly pulling upward.

D

DiodeCh 17: A two-terminal semiconductor device that conducts in one direction only. Formed by a p-n junction.
DopingCh 17: Deliberately adding impurity atoms (phosphorus, boron) to silicon to create n-type or p-type regions.
DRAMCh 24: Dynamic Random Access Memory — the cheap, dense, off-chip memory that holds most of a program's working set.

E

EDACh 6: Electronic Design Automation — the software (Cadence, Synopsys) used to design integrated circuits.
EmbeddingCh 30: Mapping a discrete token to a high-dimensional vector via a learned table lookup.
EmbeddingCh 32: A learned dense vector that represents a chunk of text (or other data) so that similar meanings sit nearby in vector space.
EpitaxyCh 7: Growing a crystalline layer on top of a crystalline substrate, atom by atom.
EUVCh 8: Extreme Ultraviolet — light of wavelength ~13.5 nm used in the most advanced lithography.
EvalCh 40: A scored test that measures how well a model performs a task. Modern eval suites (SWE-bench, MMLU, GPQA) are how progress is tracked.

F

FabCh 11: Semiconductor fabrication facility. The most expensive industrial buildings ever made.
Fetch–Decode–ExecuteCh 22: The fundamental instruction cycle: read an instruction, figure out what it does, do it, repeat.
Flip-flopCh 20: A bistable memory element that stores one bit and updates on a clock edge.
FlywheelCh 40: A self-reinforcing loop where deployment generates data, data improves the model, and the better model attracts more deployment.
FP8 / FP16 / FP32Ch 29: 8-, 16-, and 32-bit floating-point formats. Modern AI training uses BF16 or FP8 to save memory and bandwidth.
Function callingCh 32: A model's ability to emit a structured tool invocation (name + JSON arguments) instead of free text.

G

GateCh 18: (1) The control electrode of a transistor. (2) A logic primitive (AND, OR, NAND) built from transistors.
GPUCh 28: Graphics Processing Unit. A throughput-oriented processor with thousands of simple ALUs and tensor cores.
GRUBCh 25: GRand Unified Bootloader — the standard Linux bootloader.

H

HBMCh 12: High-Bandwidth Memory — stacked DRAM tightly coupled to a GPU through a silicon interposer.
HoleCh 17: A missing electron in a semiconductor's bonding lattice; behaves as a positive charge carrier.

I

InferenceCh 30: Running a trained model forward to produce outputs (e.g., next-token prediction).
Instruction set (ISA)Ch 23: The contract between hardware and software: the menu of operations a CPU promises to execute.
InterconnectCh 10: The miles of copper wiring stacked above the transistors on a chip.
InterposerCh 12: A silicon substrate with fine wiring that connects multiple dies in a 2.5D package.

K

KernelCh 26: (1) The privileged core of an operating system. (2) A function dispatched to a GPU.
KV cacheCh 30: In a transformer, the stored Key and Value tensors that let an autoregressive model avoid recomputing past tokens.

L

L1 / L2 / L3Ch 24: The three on-chip cache levels above main memory. L1 is fastest and smallest; L3 is largest and slowest.
Latency budgetCh 33: The total time an agent has to think and act before the user gives up. Each step in the loop spends part of it.
LithographyCh 8: Also called photolithography. Patterning a chip by projecting an image of a circuit onto a photoresist-coated wafer.
LLVM IRCh 27: An intermediate representation used by Clang, Rust, Swift, and many other compilers.
Logic gateCh 19: A circuit (AND, OR, NAND, NOR, XOR) that produces a Boolean output from Boolean inputs.

M

Machine codeCh 27: The actual bytes a CPU fetches and executes.
Matrix multiply (matmul)Ch 29: The fundamental operation of neural networks. Tensor cores execute it as a hardware primitive.
MCPCh 36: Model Context Protocol — Anthropic's open standard (2024) for letting models talk to external tools and data sources.
MLPCh 29: Multi-Layer Perceptron — a stack of linear layers and nonlinearities. The 'feed-forward' half of a transformer block.
MMUCh 26: Memory Management Unit — the hardware that translates virtual addresses to physical via page tables.
MOSFETCh 18: Metal-Oxide-Semiconductor Field-Effect Transistor — the dominant transistor in modern chips.

N

NANDCh 19: (1) A logic gate whose output is the negation of AND. (2) Functionally complete: any logic can be built from NAND alone.
NVL72Ch 14: NVIDIA's rack-scale system that connects 72 GPUs over a copper midplane.
NVLinkCh 14: NVIDIA's high-bandwidth GPU-to-GPU interconnect.

O

OccupancyCh 28: The ratio of active warps to maximum warps on a GPU streaming multiprocessor. High occupancy hides latency.
Operating systemCh 26: The software that manages hardware resources and provides services (processes, files, networking) to programs.

P

p-n junctionCh 17: The boundary between p-type and n-type silicon — the building block of every diode and transistor.
Page faultCh 26: An exception raised by the MMU when a program accesses a virtual address whose physical page isn't present.
Page tableCh 26: A tree structure in memory that the MMU walks to translate virtual addresses to physical.
PipelineCh 21: Overlapping the stages of multiple instructions so the CPU completes one per cycle even though each takes several.
PolysiliconCh 3: Ultra-pure (9N) silicon produced by the Siemens process from trichlorosilane.
ProcessCh 26: An OS-level running program with its own address space and identity.
Program counterCh 22: The CPU register holding the address of the next instruction to fetch.

Q

QuartziteCh 1: A high-purity SiO₂ rock, the geological starting point of the silicon supply chain.

R

RAGCh 37: Retrieval-Augmented Generation — fetching relevant chunks from a vector store and injecting them into the prompt before the model answers.
RegisterCh 24: A named storage slot inside a CPU, accessed in a single cycle.
ResetCh 25: The signal that forces all flip-flops on a chip to a known state when power comes up.
RoutingCh 39: Choosing which model to send a request to based on price, latency, or capability. The new commoditization layer above the model APIs.
RubinCh 13: NVIDIA's GPU architecture following Blackwell, comprising the Vera Rubin Superchip and the Rubin Ultra rack.

S

SchedulerCh 26: The OS code that decides which process runs next on each CPU core.
SIMTCh 28: Single Instruction, Multiple Threads — NVIDIA's GPU execution model. A refinement of SIMD.
SoftmaxCh 30: A function that turns a vector of real numbers into a probability distribution.
SwarmCh 35: A coordinated group of agents (planner, researchers, writers, critics) that share state and aim at one outcome.
System callCh 26: A controlled trap from a user program into the OS kernel to request a privileged service.

T

Tensor coreCh 28: A specialized GPU unit that performs a small matrix multiply-accumulate as a single instruction.
Threshold voltageCh 18: The gate voltage above which a MOSFET starts to conduct. Typically ~0.4-0.7 V.
TLBCh 24: Translation Look-aside Buffer — a small cache of recent virtual-to-physical translations.
TokenCh 30: A subword unit of text used by language models. A typical model has ~100,000 tokens in its vocabulary.
TokenCh 32: The unit a language model reads and writes. Roughly three characters of English on average.
Tool useCh 34: A model's ability to invoke external functions, APIs, browsers, or code interpreters mid-conversation.
TransformerCh 29: A neural-network architecture (Vaswani et al. 2017) built from stacked self-attention and MLP blocks.
TransistorCh 18: The fundamental electronic switch. A modern GPU has ~80 billion of them.
TSMCCh 11: Taiwan Semiconductor Manufacturing Company — the world's leading contract chip fabricator.

U

UEFICh 25: Unified Extensible Firmware Interface — the modern PC firmware standard, replacing legacy BIOS.

V

Valence bandCh 17: The band of energy levels in a solid in which electrons are bound to atoms.
Vector databaseCh 37: A datastore (FAISS, pgvector, Pinecone, Weaviate) optimized for nearest-neighbor search over embeddings.
Virtual memoryCh 26: An OS abstraction that gives every process its own private address space, translated by the MMU.

W

WaferCh 5: A thin disk of single-crystal silicon (typically 300 mm diameter) on which chips are fabricated.
WarpCh 28: A group of 32 GPU threads that execute the same instruction in lockstep. The atomic unit of GPU work.
WeightCh 29: A learned numerical parameter inside a neural network.

Y

YieldCh 11: The fraction of dies on a wafer that pass test. The most-guarded number in the semiconductor industry.