Chapter 18 · Memory and State — The Agentic Enterprise

Memory is the property that distinguishes a useful agent from a stateless question-answering system. Without memory, every interaction starts from the same blank slate, every piece of context must be re-supplied by the user, and the agent cannot accumulate the understanding of a domain, a user, or an ongoing task that makes sophisticated autonomous work possible. But memory also introduces some of the most significant risks in agentic deployment: an agent that remembers incorrectly is more dangerous than one that forgets, because it acts with unwarranted confidence; an agent that can be made to remember adversarially crafted content is a security vulnerability; and an agent whose memory is accessible to unauthorized parties is a data governance failure. Getting memory right requires thinking carefully about what the agent needs to remember, for how long, in what form, and under what access controls.

Three Kinds of Memory

Agentic memory is typically categorized into three types that differ in their scope, duration, and storage mechanism. In-context memory is the simplest: it is the content of the model's context window during a single agent run. Everything the agent knows during a run — the system prompt, the conversation history, the tool results, the intermediate reasoning — lives in the context window. In-context memory is fast and flexible, but it is bounded by the context window size (which, even at 128K or 200K tokens, is finite) and it evaporates entirely when the run ends. It is appropriate for single-turn tasks and short workflows, but insufficient for anything that spans multiple sessions or requires knowledge accumulated over time.

External memory extends the agent's effective knowledge by connecting it to a retrieval system: a vector database, a document store, a structured knowledge graph, or some combination of these. When the agent needs information that exceeds the context window, it issues a retrieval query and incorporates the results into its working context. External memory can be vast — terabytes of documents, years of interaction history — and it persists across runs. But retrieval introduces a new failure mode: the agent may retrieve information that is outdated, irrelevant, or adversarially crafted, and it has no reliable way to distinguish good retrieval results from bad ones without additional verification mechanisms.

Parametric memory is knowledge encoded in the model's weights through training and fine-tuning. Unlike in-context and external memory, parametric memory cannot be updated at runtime — changing it requires retraining or fine-tuning the model, which is expensive and time-consuming. Parametric memory is valuable for stable, domain-specific knowledge that the agent will need constantly — the vocabulary of a specialized field, the conventions of a particular document type, the behavioral policies that should always be active — but it is the wrong mechanism for knowledge that changes frequently or that must be auditable.

Episodic Memory and the Accumulation of Experience

A fourth category — episodic memory — is increasingly recognized as a distinct and important type. Episodic memory stores structured records of past agent runs: what task was attempted, what steps were taken, what succeeded and what failed, and what the outcome was. Unlike external memory, which stores factual content, episodic memory stores procedural history — the agent's accumulated experience of how to approach classes of task. An agent with access to its episodic memory can recognize that a new task is similar to one it has successfully completed before and adapt its approach accordingly; it can also recognize that a proposed sequence of steps has historically led to failure and generate an alternative plan.

Building effective episodic memory requires solving several hard problems. The records must be structured consistently enough to be retrievable — which means designing a schema for episode representation that captures the relevant dimensions of variation across tasks. They must be indexed effectively — which means choosing retrieval mechanisms that can surface semantically relevant episodes, not just syntactically similar ones. And they must be maintained — which means building processes to review, curate, and when necessary correct the episodic store, because an agent that has learned the wrong lesson from a past failure is more dangerous than one that learned nothing.

"The agent that cannot learn from its own history is condemned to repeat it. But the agent that learns the wrong lessons from a poisoned history may be more dangerous than the amnesiac."

State Management and the Persistence Problem

State management is the engineering problem of keeping track of where the agent is in a complex, multi-step workflow across interruptions, failures, and restarts. An agent that is interrupted mid-workflow — by a system timeout, a human-in-the-loop checkpoint, or an unexpected error — must be able to resume from where it left off without re-executing steps that have already been completed, without losing the context accumulated so far, and without violating the invariants of the workflow's state machine. This is a harder problem than it appears, because the state of an agent run is not just a set of values — it includes the model's internal state (which cannot be serialized directly), the pending tool calls (which may have side effects that cannot be safely retried), and the conversation history (which may contain sensitive information that must be stored with appropriate access controls).

LangGraph's persistence layer addresses this problem through a checkpoint architecture: at each node in the execution graph, the full state is serialized and stored in an external store (PostgreSQL, Redis, or a cloud-native equivalent). If execution is interrupted, it can be resumed from the most recent checkpoint without re-executing prior steps. This checkpointing architecture also provides the foundation for the forensic trail required by the EU AI Act's Article 12 logging obligations: the checkpoint store is, in effect, a complete record of every state the agent passed through during its execution, which is exactly what an auditor needs to reconstruct the sequence of events that led to a particular outcome.

Memory Security and Data Governance

Memory stores are high-value targets for adversaries. An agent's vector store typically contains a condensed, semantically indexed representation of the organization's most important information — exactly the kind of information that an adversary wants to exfiltrate. Access controls on memory stores must be at least as rigorous as access controls on the source documents, and in practice they should be more rigorous, because the vector store's semantic indexing makes it easier to extract information efficiently than from the source documents themselves.

The specific attack known as memory poisoning — where an adversary injects crafted content into the agent's memory store with the intention of influencing its future behavior — is a significant and underappreciated threat. A prompt injection attack that succeeds in getting malicious instructions into the agent's episodic or external memory store is more dangerous than one that affects only a single run, because its effects persist across all subsequent runs that retrieve the poisoned content. Defense against memory poisoning requires both technical controls (input validation, anomaly detection on memory writes, human review of episodic memory updates) and architectural controls (separation between the agent's read path and write path, with stronger authentication required for writes).

The Context Window Is Not Memory

A common misconception in early agentic deployments is that a large context window solves the memory problem. It doesn't. A 200K-token context window is large enough to hold a small book, but it is not large enough to hold the accumulated interaction history of a production agent running thousands of tasks per month. More importantly, stuffing the context window with everything the agent might possibly need is expensive (every token costs money), slow (inference latency scales with context length), and often counterproductive (models that receive more context than they need frequently exhibit degraded performance on the task at hand). Effective memory architecture selects what goes into the context window, it doesn't attempt to make the context window large enough to hold everything.