Markets of Models

9 atomic recall cards. Export to Anki and let spaced repetition do its slow work.

In Anki: File → Import, choose this TSV, set field separator to Tab, deck = Sand to Silicon · Ch 39, note type = Basic.

Front	Back
By roughly how much did frontier-class inference cost per token fall from 2023 to 2026?	~95% — from ~$30/M tokens to under $2/M tokens.
How many models does a typical production routing layer choose between?	~12–20 models.
What share of API traffic at major aggregators went through routers by 2026?	~30% of API traffic at major aggregators.
Name three commercial routing-layer products mentioned in the chapter.	OpenRouter, major cloud gateway products (Bedrock, Vertex, Azure AI), and specialist routers (cheaper-for-bulk, faster-for-real-time, regulated-for-finance).
What historical analogy does the chapter draw to explain how the routing layer captures surplus?	Airline GDS systems in the 1980s, payment-card networks, and stock exchanges — intermediaries that aggregated and standardized supply.
Name three open-weight inference specialists mentioned in the chapter.	Together, Fireworks, Replicate, and Groq (any three of these four).
What fraction of frontier price does a mid-tier model typically cost, per the chapter's cost-curve description?	~30% of frontier price for ~80–90% of the quality on most tasks.
How did frontier vendors behave on pricing when a competitor cut rates on a mid-tier model?	They typically responded within weeks — Haiku, smaller GPT-5 variants, and Gemini Flash all leapfrogged each other on price-per-million-tokens through 2024–2025.
What is the aggregate cost reduction a sophisticated buyer can achieve through multi-model arbitrage?	3–10× cost reduction over naive single-model deployment for the same task quality.