Chapter 05 · The Reference Stack — Building an Agentic Enterprise

Every enterprise eventually draws an agent stack. Some draw it well, some draw it once and forget. The shape that's converging across well-run programmes — across financial services, technology firms, healthcare, retail — has six layers. Naming them out loud is the first step to actually building them.

Layer 1 — Experience

Where users and systems meet the agent. Chat windows are the cliché, and they have their place, but the most successful enterprise agents in 2026 are embedded: in the support tool the rep already uses, in the email client where the salesperson already lives, in the IDE where the engineer is already working. The agent that requires a context-switch to use is the agent that gets used less.

The layer also includes voice (call centres are the largest agent surface in financial services), email (one of the most useful, because email is asynchronous and forgiving), and machine-to-machine surfaces (other systems calling your agent via API or A2A). Each surface has its own latency budget and its own UX rules. None can be ignored.

Layer 2 — Orchestration

The runtime that coordinates the agent loop, manages tool calls, handles retries, branches on conditions, and schedules sub-agents. The serious choices in mid-2026 are LangGraph (graph-based, good for explicit control flow), CrewAI (role-based, good for multi-agent with simple coordination), Microsoft Semantic Kernel (integrates well with Azure and the Microsoft stack), OpenAI's Agents SDK (native to the OpenAI ecosystem), Google ADK / Vertex Agents, and AWS Bedrock Agents. Beyond those, n8n and Microsoft Power Automate are no-code/low-code options worth taking seriously for narrower flows.

The bad news: switching cost between orchestrators is real. The agent's prompts, tool wrappers, memory schema, and tracing format are all framework-specific. The good news: the core abstractions are converging, and a thin internal abstraction layer (your own "agent runner" interface) buys you the option of switching later. Almost every team that did this is glad they did.

Layer 3 — Registry & Identity

The most-skipped layer, and the layer that most distinguishes a programme from a pilot.

An agent registry is the catalogue of every agent in the organisation: name, owner, business purpose, scope, tools it can call, model it uses, data it touches, regulatory classification, current version, owner-on-call, kill-switch URL, last eval pass date. Without it, you cannot answer "how many agents are in production?" with a straight face after the first dozen. With it, you can. We dig into the registry's schema and lifecycle in Chapter 12.

The identity sub-layer is where agentic AI breaks the assumptions of traditional IAM. An agent acting on a user's behalf needs scoped, time-bound access to that user's data — not the agent's own god-mode service account. OAuth on-behalf-of, token vending services, and per-run credentials are how that gets done. The early-stage "give the agent a service account with admin rights" pattern is the pattern that ends up on the front page.

Layer 4 — Memory

Short-term, long-term, episodic, semantic. Each picks its own substrate. Vector databases (pgvector, Pinecone, Weaviate, Qdrant) handle fuzzy long-term and episodic. Key-value stores handle structured long-term. Postgres or your existing data warehouse handles semantic, with a knowledge-graph or RAG layer over the top. Short-term lives in the context window — the cheapest and most volatile memory of all.

The honest test of your memory layer is governance, not technology. Who can write to long-term memory? Who can read it? When does it expire? Is PII flagged? Is there a "right to be forgotten" path? Most memory layers are perfectly engineered and perfectly ungoverned, and that's a regulatory accident waiting to happen.

Layer 5 — Tools

Where the agent meets the rest of your stack. Each tool is an MCP server, an internal API, or a function-call wrapper around an existing system. The tool catalogue is owned, versioned, and discoverable. The two questions to ask: can a new agent reuse an existing tool, and can we revoke a tool's access without redeploying the agent? If both are yes, you have a real tool layer. If not, every agent is reinventing wheels and accumulating risk.

The fast-moving piece here is MCP. As of mid-2026, most major model providers and tool vendors support it natively. The investment to standardise on MCP (or whatever supersedes it) pays back as soon as you have more than one agent talking to more than one tool.

Layer 6 — Evals & Observability

The cross-cutting layer that keeps the other five honest. Evals run pre-deploy; observability runs in production. The connecting thread is the trace — a structured record of every step the agent took, every tool it called, every result it got, every token of input and output, every dollar of cost.

If you do nothing else from this chapter, do this: ensure every agent run produces a trace, and that traces are searchable by humans. With searchable traces, you can debug. Without them, you cannot. Most of the cost of an evals/observability platform is justified by the first incident it lets you triage in twenty minutes instead of three days.

Buy, build, or rent

The pragmatic answer for a team starting in 2026: rent the orchestration runtime (LangGraph or your cloud provider's), buy the evals platform (one of LangSmith / Braintrust / Arize), build the registry and the tool layer (because both are deeply specific to your business), and treat memory as a portfolio of off-the-shelf stores plus your own thin governance layer.

The seductive alternative is to buy a full vertical stack from a single vendor — Salesforce Agentforce, ServiceNow AI Agents, Microsoft Copilot Studio. We'll discuss those in Chapter 18. They are real products and they ship real value. They are also lock-in by design, and they make the registry-and-evals layer the vendor's, not yours, which is fine until you want to consolidate across vendors. There is no neutral choice; there is only the choice you make on purpose.

Flywheel note

The compounding from this stack happens at Layers 3 and 6 — the registry and evals. Every new agent makes the registry more valuable (faster onboarding, more reuse) and every production run makes the eval suite better (more cases, more coverage). The teams that win in agentic AI win because those two layers are theirs and improve every quarter, regardless of which model is fashionable.

Figure 5.1The reference stack. Six layers, named honestly. The two thin ones — registry and evals — are where the compounding lives.