Chapter 15 · Walking the NIST AI RMF — The Agentic Enterprise

The NIST AI Risk Management Framework was released in January 2023 as a voluntary, non-prescriptive guide for managing the risks of AI systems across their entire lifecycle. Its four functions — Govern, Map, Measure, Manage — form a practical loop that organizations can walk regardless of their sector, their regulatory context, or the specific AI technology they are deploying. Walking the RMF for an agentic program is different from walking it for a conventional predictive model, because the risk surface is wider, the failure modes are more diverse, and the velocity of change is higher. This chapter follows that walk concretely, pausing at each function to identify where the standard guidance strains and where enterprise practitioners have had to improvise.

Govern: The Foundation

The Govern function is about creating the organizational conditions under which responsible AI risk management can happen. In the RMF's framing, Govern asks: does the organization have policies, roles, and processes in place to make risk-informed decisions about AI? For an agentic program, the Govern function maps directly to the governance charter and policy stack discussed in the preceding two chapters. But the RMF adds a specific emphasis that practitioners sometimes underweight: accountability structures must be documented and tested, not merely declared.

Testing accountability means running tabletop exercises that simulate agentic incidents and asking: who actually gets called, in what order, and do they have the authority and the information they need to respond effectively? Organizations that skip this step frequently discover, during an actual incident, that their accountability structures are aspirational rather than operational — the CISO is listed as the incident commander but has no runbook for an agent that has exfiltrated data through a tool the security team didn't know existed.

The RMF's Govern function also addresses organizational culture. The framework is explicit that risk management cannot succeed if it is treated as a compliance exercise rather than a genuine operational priority. For agentic AI, this cultural dimension is particularly important because the technology is moving faster than any governance framework can track, and the only durable protection is a team that is genuinely curious about failure modes rather than one that is merely checking boxes on an audit schedule.

Map: Understanding the Risk Surface

The Map function asks organizations to identify and categorize the risks associated with a specific AI system in a specific deployment context. For conventional models, this typically means a risk taxonomy organized around model performance dimensions: accuracy, fairness, robustness, privacy. For agents, the Map function must extend to cover a substantially wider risk surface that includes the agent's tools, its memory architecture, its multi-agent interactions, and the potential for adversarial manipulation of any of these components.

The concept of an "agentic fitness gap" has emerged from practitioners attempting to apply the standard RMF risk taxonomy to agentic systems. The gap refers to the distance between what the RMF's default risk categories capture and what actually goes wrong with agents in production. Standard categories like "bias" and "accuracy" are meaningful for agentic systems, but they miss the failure modes that are most operationally significant: unauthorized tool use, context window poisoning, cross-agent trust exploitation, and resource exhaustion from runaway loops. Closing the fitness gap requires extending the Map function's risk taxonomy with agent-specific categories.

The Cloud Security Alliance has proposed an Agentic Profile extension to the NIST RMF that adds exactly these categories, along with a structured process for mapping the trust relationships between agents in a multi-agent system. The extension remains in draft form as of early 2026, but several large financial services organizations have begun piloting it in their internal RMF implementations.

Figure 15.1Five levels. The jump from L2 to L3 is where most agent programs stall — it requires standards, not just teams.

Measure: Evaluating What You Cannot Fully Observe

The Measure function asks organizations to assess, analyze, and track the risks they have identified. For conventional models, measurement typically means evaluating performance metrics on a held-out test set and tracking those metrics in production through model monitoring. For agents, measurement is fundamentally harder because the agent's behavior in production is not a simple mapping from inputs to outputs — it is a sequence of decisions, tool calls, and state transitions, many of which are not directly observable through conventional monitoring.

The NIST AI Agent Standards Initiative, announced in February 2026, is developing an AI Agent Interoperability Profile planned for Q4 2026 that will, among other things, define standard telemetry formats for agentic systems. Until that profile is available, organizations are improvising: using OpenTelemetry traces to capture the sequence of tool calls and model invocations that constitute an agent run, and building custom evaluation harnesses that measure behavioral properties — instruction-following fidelity, tool use appropriateness, escalation correctness — rather than just output quality.

The Measure function also includes an often-overlooked component: uncertainty quantification. An agent that is uncertain about the right action is a different risk profile from an agent that is confidently wrong. Emerging evaluation frameworks like Langfuse and Arize Phoenix are beginning to expose uncertainty signals from the underlying model as part of their trace data, which allows risk teams to flag runs where the agent acted confidently in a domain where its historical accuracy has been low.

"The NIST RMF's Manage function is where governance meets engineering. An organization that has governed and mapped and measured but cannot stop a running agent that is behaving badly has built a risk program that ends at the page."

Manage: Operationalizing the Response

The Manage function asks organizations to prioritize and respond to the risks they have measured. In the RMF's framing, this includes both proactive risk treatment — changing the system to reduce risk before it materializes — and reactive incident response — containing and remediating harms after they occur. For agents, the Manage function maps to three concrete operational capabilities: the ability to modify an agent's tool permissions without redeploying it, the ability to roll back to a prior model version when a behavioral regression is detected, and the ability to hard-stop a running agent without losing the state needed for post-incident analysis.

These three capabilities are not difficult to build, but they require intentional design. An orchestration architecture that cannot modify tool permissions at runtime — because they are baked into the agent's system prompt rather than managed through a policy engine — cannot execute the Manage function's risk treatment actions quickly enough to prevent a slow-moving harm from becoming a large one. The alignment between the RMF's Manage function requirements and the architectural choices in the orchestration stack is one of the most important and least-discussed connections in enterprise agentic AI deployment.