Reference

Glossary

103 terms used in the book — band gaps, MOSFETs, MMUs, tensor cores. Each entry links to the chapter where the idea first appears.

#

µop (micro-op)Ch 27
An internal sub-instruction a CPU's decoder produces from a machine-code instruction. Modern x86 cores execute these out-of-order.

A

A2ACh 36
Agent-to-Agent protocol — Google's open standard (2024) for letting one agent invoke another over the network.
AgentCh 34
A model placed in a loop with tools, memory, and a goal — capable of multi-step action without per-step human prompting.
ALUCh 20
Arithmetic Logic Unit. The part of a CPU that performs arithmetic and bitwise operations.
ASMLCh 8
The Dutch company that builds the world's only EUV lithography machines.
AssemblyCh 27
A human-readable form of machine code, with named registers and labels.
ASTCh 27
Abstract Syntax Tree — a tree representation of a program's structure produced by a parser.
AttentionCh 29
A neural-network operation that lets each token blend information from every other token via Q, K, V matrices.

B

Band gapCh 17
The energy distance between a material's valence and conduction bands. Silicon's is 1.12 eV.
BF16Ch 29
Brain Floating Point 16 — a 16-bit float format with FP32's exponent range and reduced mantissa, used in deep learning.
BIOSCh 25
Basic Input/Output System — the legacy firmware that boots a PC. Replaced by UEFI on modern systems.
BootloaderCh 25
A small program (e.g. GRUB) that loads an operating-system kernel into memory and jumps to it.
BytecodeCh 27
A compact, platform-independent intermediate representation of a program, executed by a virtual machine.

C

Cache lineCh 24
The smallest unit a CPU cache moves between memory levels — typically 64 bytes.
ChannelCh 18
The narrow conducting region under a MOSFET's gate, formed when the gate voltage is high enough to invert the surface.
ClockCh 21
A regular signal that synchronizes all the flip-flops on a chip. Modern CPUs run at ~3-5 GHz.
CMOSCh 18
Complementary Metal-Oxide-Semiconductor — the dominant transistor technology, pairing n-MOS and p-MOS for low static power.
CoalescingCh 28
On a GPU, the property of a memory access pattern that lets the hardware fold many threads' reads into a single transaction.
CompilerCh 27
A program that translates source code into a lower-level language, typically machine code via an IR.
Conduction bandCh 17
The band of energy levels in a solid in which electrons are free to move and conduct current.
Context switchCh 26
The kernel operation of saving one process's state and loading another's. Costs a few microseconds on modern hardware.
Context windowCh 32
The maximum number of tokens a model can attend to at once. The price of one full window is the unit cost of one 'thought'.
CoWoSCh 12
Chip-on-Wafer-on-Substrate — TSMC's 2.5D packaging that mounts GPU dies and HBM stacks on a silicon interposer.
CPUCh 22
Central Processing Unit. A general-purpose processor optimized for branchy, latency-sensitive single-thread work.
CUDACh 28
NVIDIA's parallel-programming model and runtime for GPUs.
CzochralskiCh 4
The process of growing a single silicon crystal by dipping a seed into a melt and slowly pulling upward.

D

DiodeCh 17
A two-terminal semiconductor device that conducts in one direction only. Formed by a p-n junction.
DopingCh 17
Deliberately adding impurity atoms (phosphorus, boron) to silicon to create n-type or p-type regions.
DRAMCh 24
Dynamic Random Access Memory — the cheap, dense, off-chip memory that holds most of a program's working set.

E

EDACh 6
Electronic Design Automation — the software (Cadence, Synopsys) used to design integrated circuits.
EmbeddingCh 30
Mapping a discrete token to a high-dimensional vector via a learned table lookup.
EmbeddingCh 32
A learned dense vector that represents a chunk of text (or other data) so that similar meanings sit nearby in vector space.
EpitaxyCh 7
Growing a crystalline layer on top of a crystalline substrate, atom by atom.
EUVCh 8
Extreme Ultraviolet — light of wavelength ~13.5 nm used in the most advanced lithography.
EvalCh 40
A scored test that measures how well a model performs a task. Modern eval suites (SWE-bench, MMLU, GPQA) are how progress is tracked.

F

FabCh 11
Semiconductor fabrication facility. The most expensive industrial buildings ever made.
Fetch–Decode–ExecuteCh 22
The fundamental instruction cycle: read an instruction, figure out what it does, do it, repeat.
Flip-flopCh 20
A bistable memory element that stores one bit and updates on a clock edge.
FlywheelCh 40
A self-reinforcing loop where deployment generates data, data improves the model, and the better model attracts more deployment.
FP8 / FP16 / FP32Ch 29
8-, 16-, and 32-bit floating-point formats. Modern AI training uses BF16 or FP8 to save memory and bandwidth.
Function callingCh 32
A model's ability to emit a structured tool invocation (name + JSON arguments) instead of free text.

G

GateCh 18
(1) The control electrode of a transistor. (2) A logic primitive (AND, OR, NAND) built from transistors.
GPUCh 28
Graphics Processing Unit. A throughput-oriented processor with thousands of simple ALUs and tensor cores.
GRUBCh 25
GRand Unified Bootloader — the standard Linux bootloader.

H

HBMCh 12
High-Bandwidth Memory — stacked DRAM tightly coupled to a GPU through a silicon interposer.
HoleCh 17
A missing electron in a semiconductor's bonding lattice; behaves as a positive charge carrier.

I

InferenceCh 30
Running a trained model forward to produce outputs (e.g., next-token prediction).
Instruction set (ISA)Ch 23
The contract between hardware and software: the menu of operations a CPU promises to execute.
InterconnectCh 10
The miles of copper wiring stacked above the transistors on a chip.
InterposerCh 12
A silicon substrate with fine wiring that connects multiple dies in a 2.5D package.

K

KernelCh 26
(1) The privileged core of an operating system. (2) A function dispatched to a GPU.
KV cacheCh 30
In a transformer, the stored Key and Value tensors that let an autoregressive model avoid recomputing past tokens.

L

L1 / L2 / L3Ch 24
The three on-chip cache levels above main memory. L1 is fastest and smallest; L3 is largest and slowest.
Latency budgetCh 33
The total time an agent has to think and act before the user gives up. Each step in the loop spends part of it.
LithographyCh 8
Also called photolithography. Patterning a chip by projecting an image of a circuit onto a photoresist-coated wafer.
LLVM IRCh 27
An intermediate representation used by Clang, Rust, Swift, and many other compilers.
Logic gateCh 19
A circuit (AND, OR, NAND, NOR, XOR) that produces a Boolean output from Boolean inputs.

M

Machine codeCh 27
The actual bytes a CPU fetches and executes.
Matrix multiply (matmul)Ch 29
The fundamental operation of neural networks. Tensor cores execute it as a hardware primitive.
MCPCh 36
Model Context Protocol — Anthropic's open standard (2024) for letting models talk to external tools and data sources.
MLPCh 29
Multi-Layer Perceptron — a stack of linear layers and nonlinearities. The 'feed-forward' half of a transformer block.
MMUCh 26
Memory Management Unit — the hardware that translates virtual addresses to physical via page tables.
MOSFETCh 18
Metal-Oxide-Semiconductor Field-Effect Transistor — the dominant transistor in modern chips.

N

NANDCh 19
(1) A logic gate whose output is the negation of AND. (2) Functionally complete: any logic can be built from NAND alone.
NVL72Ch 14
NVIDIA's rack-scale system that connects 72 GPUs over a copper midplane.
NVLinkCh 14
NVIDIA's high-bandwidth GPU-to-GPU interconnect.

O

OccupancyCh 28
The ratio of active warps to maximum warps on a GPU streaming multiprocessor. High occupancy hides latency.
Operating systemCh 26
The software that manages hardware resources and provides services (processes, files, networking) to programs.

P

p-n junctionCh 17
The boundary between p-type and n-type silicon — the building block of every diode and transistor.
Page faultCh 26
An exception raised by the MMU when a program accesses a virtual address whose physical page isn't present.
Page tableCh 26
A tree structure in memory that the MMU walks to translate virtual addresses to physical.
PipelineCh 21
Overlapping the stages of multiple instructions so the CPU completes one per cycle even though each takes several.
PolysiliconCh 3
Ultra-pure (9N) silicon produced by the Siemens process from trichlorosilane.
ProcessCh 26
An OS-level running program with its own address space and identity.
Program counterCh 22
The CPU register holding the address of the next instruction to fetch.

Q

QuartziteCh 1
A high-purity SiO₂ rock, the geological starting point of the silicon supply chain.

R

RAGCh 37
Retrieval-Augmented Generation — fetching relevant chunks from a vector store and injecting them into the prompt before the model answers.
RegisterCh 24
A named storage slot inside a CPU, accessed in a single cycle.
ResetCh 25
The signal that forces all flip-flops on a chip to a known state when power comes up.
RoutingCh 39
Choosing which model to send a request to based on price, latency, or capability. The new commoditization layer above the model APIs.
RubinCh 13
NVIDIA's GPU architecture following Blackwell, comprising the Vera Rubin Superchip and the Rubin Ultra rack.

S

SchedulerCh 26
The OS code that decides which process runs next on each CPU core.
SIMTCh 28
Single Instruction, Multiple Threads — NVIDIA's GPU execution model. A refinement of SIMD.
SoftmaxCh 30
A function that turns a vector of real numbers into a probability distribution.
SwarmCh 35
A coordinated group of agents (planner, researchers, writers, critics) that share state and aim at one outcome.
System callCh 26
A controlled trap from a user program into the OS kernel to request a privileged service.

T

Tensor coreCh 28
A specialized GPU unit that performs a small matrix multiply-accumulate as a single instruction.
Threshold voltageCh 18
The gate voltage above which a MOSFET starts to conduct. Typically ~0.4-0.7 V.
TLBCh 24
Translation Look-aside Buffer — a small cache of recent virtual-to-physical translations.
TokenCh 30
A subword unit of text used by language models. A typical model has ~100,000 tokens in its vocabulary.
TokenCh 32
The unit a language model reads and writes. Roughly three characters of English on average.
Tool useCh 34
A model's ability to invoke external functions, APIs, browsers, or code interpreters mid-conversation.
TransformerCh 29
A neural-network architecture (Vaswani et al. 2017) built from stacked self-attention and MLP blocks.
TransistorCh 18
The fundamental electronic switch. A modern GPU has ~80 billion of them.
TSMCCh 11
Taiwan Semiconductor Manufacturing Company — the world's leading contract chip fabricator.

U

UEFICh 25
Unified Extensible Firmware Interface — the modern PC firmware standard, replacing legacy BIOS.

V

Valence bandCh 17
The band of energy levels in a solid in which electrons are bound to atoms.
Vector databaseCh 37
A datastore (FAISS, pgvector, Pinecone, Weaviate) optimized for nearest-neighbor search over embeddings.
Virtual memoryCh 26
An OS abstraction that gives every process its own private address space, translated by the MMU.

W

WaferCh 5
A thin disk of single-crystal silicon (typically 300 mm diameter) on which chips are fabricated.
WarpCh 28
A group of 32 GPU threads that execute the same instruction in lockstep. The atomic unit of GPU work.
WeightCh 29
A learned numerical parameter inside a neural network.

Y

YieldCh 11
The fraction of dies on a wafer that pass test. The most-guarded number in the semiconductor industry.