Building an Agentic Enterprise  ·  Chapter 13 of 21
Chapter 13

The Risks That Bite

OWASP LLM Top 10 (2025), MITRE ATLAS, and prompt injection as the central threat

10
categories in OWASP's 2025 LLM Top 10
9.6
CVSS score for the GitHub Copilot RCE via indirect injection (CVE-2025-53773)
14
new agent-specific techniques added to MITRE ATLAS in October 2025
Save PDF

If Chapter 11 was about responding to incidents, this chapter is about the catalogue of incidents to expect. There are ten categories worth knowing by heart, and one — prompt injection — that you must internalise as the central threat of the field. Everything else is mitigation around it.

The 2025 OWASP top ten

The 2025 OWASP Top 10 for LLM Applications is the most useful single artefact for risk-ranking your agent program. The categories: prompt injection (LLM01); sensitive information disclosure (LLM02); supply-chain risks in models, data, and plugins (LLM03); data and model poisoning (LLM04); improper output handling — when LLM output is piped into code execution, SQL, or OS calls without validation (LLM05); excessive agency — too many tools, permissions, or autonomy (LLM06); system prompt leakage (LLM07); vector and embedding weaknesses, including RAG poisoning (LLM08); misinformation, where confabulated content drives high-stakes decisions (LLM09); and unbounded consumption — runaway agents, token spend, resource exhaustion (LLM10).

The categories most often involved in real incidents in 2025 were LLM01, LLM02, LLM05, and LLM06. The remaining six matter, but the four above are where the production losses concentrate. If your team can defend convincingly against those four, you are in a stronger position than most.

Prompt injection: direct and indirect

NIST has classified indirect prompt injection as "generative AI's greatest security flaw". The distinction matters. Direct injection is what most people imagine: an attacker types malicious instructions into the input box. It is detectable, loggable, and has a known mitigation surface (input filtering, prompt-boundary discipline, output anomaly detection).

Indirect injection is the form that has produced the major confirmed exploits of 2025. The attacker never speaks to the LLM. They plant instructions in content the model will later consume — a web page the agent browses, an email it summarises, a document it processes, a record it retrieves from a vector database, or a tool description it reads from an MCP server. The victim user triggers the attack by asking the agent to do something legitimate. The agent reads the planted text, follows the planted instructions, and acts.

2025's confirmed cases include zero-click data exfiltration from Microsoft 365 Copilot (CVE-2025-32711), remote code execution via GitHub Copilot (CVE-2025-53773, CVSS 9.6), and single-click exfiltration from Microsoft Copilot Personal. None exploited a traditional vulnerability. All exploited the agent's willingness to act on text it read.

The mitigation surface is described in detail in Chapter 14, but the headline is short: assume every external content source the agent reads is potentially adversarial, sanitise inputs, separate trusted prompts from untrusted content with explicit boundaries, monitor for anomalous tool-call sequences, and apply least-privilege at the tool layer so a successful injection has somewhere to fail safely.

Excessive agency

OWASP LLM06 is the risk that should be most familiar to anyone who has read this report carefully. It is the same risk Chapter 4 described with five questions and Chapter 9 described with action-consequence mapping: an agent that has more authority than its job actually requires.

The mitigation is structural: every agent's tool list is the minimum set required for its specific task; every credential is scoped to the task and time-limited; every irreversible or consequential action sits behind a human approval gate (Chapter 14); and every authority is reviewed on the schedule recorded in the registry. None of this is novel security thinking. What is new is that the principal demanding the permission is a model whose behaviour is non-deterministic, which is why the discipline has to be technical rather than procedural.

Questions to ask

For your most autonomous agent: list the tools it can invoke. For each, can you state — without checking — whether the access is read-only or write, what scope of records it can affect, and whether a human approves any action? If the list is longer than five tools or you cannot answer for any of them, you have an excessive-agency problem to fix this quarter.

Agent IAM

Agents are a new principal type in identity and access management. They are not users (they don't authenticate as humans), they are not services (they act on behalf of specific humans for specific tasks), and they are not bots in the older sense (they make non-deterministic decisions about what to do with their permissions). MCP's OAuth 2.1 with PKCE support is the current best mechanism for the on-behalf-of pattern, and is the right default for agents acting in a user's context.

Four principles make agent IAM survivable. Least privilege, always: every agent has the minimum permissions required, and nothing more, and the temptation to over-grant for convenience is resisted. Scoped, time-limited credentials: an agent handling a support ticket holds credentials only for the duration of that ticket, not the lifetime of the agent. OAuth on behalf of users: when an agent acts in a user's context, it authenticates as that user via delegated authorisation, with consent — not as a service principal with elevated rights. Agent identity in audit logs: every action attributed to a specific agent identity (not "the AI system") and linked back to the human principal who authorised the session.

MITRE ATLAS now documents 14 agent-specific attack techniques beyond the original ATT&CK catalogue, including AI Agent Context Poisoning, Memory Manipulation, and Thread Injection. ATLAS is the threat-model vocabulary the security team should be using. The next chapter takes the same risks and asks the inverse question: where does the human belong in the loop, and what does the loop look like when the human is doing real work, not rubber-stamping?

OWASP LLM Top 10 (2025) Bars indicate relative share of confirmed 2025 production incidents. Four categories dominate. LLM01 prompt injection direct + indirect LLM02 sensitive disclosure PII · keys · system prompts LLM05 improper output handling unsafe code/sql/email LLM06 excessive agency over-tooled · over-scoped LLM09 misinformation confabulation in flows LLM04 data + model poisoning LLM08 vector + RAG weakness LLM10 unbounded consumption LLM03 supply chain LLM07 system prompt leakage Indicative ranking, drawn from public 2025 advisories (CVE-2025-32711, CVE-2025-53773, ATLAS case studies). Calibrate to your own incident corpus.
Figure 13.1OWASP LLM Top 10 (2025) ranked by frequency of confirmed production incidents in 2025. Prompt injection, sensitive disclosure, improper output handling, and excessive agency dominate.