The economics of the first agent pilot are not the economics of a hundred agents at scale. The first pilot is priced like an experiment: the cost is dominated by engineering time, the benefit is measured in learning, and no one expects the unit economics to be favourable. The hundredth agent is priced like a product: the cost is dominated by inference, orchestration, and governance overhead, and the benefit is measured in gross margin. The organisation that does not understand this transition will over-invest in early pilots and under-invest in the infrastructure that makes scale economical.
Unit economics of an agent
The unit cost of an agent action — one complete execution of the agent's task, from receiving the input to completing the output — is the sum of four cost categories. Inference cost: the API fees paid to the model provider for the tokens consumed in planning, tool calls, and generating the output. For a complex multi-step agent, this can be significantly higher than a simple Q&A prompt: a ten-step agent that generates a 200-token plan, makes five tool calls, and produces a 500-token output may consume twenty to forty times the tokens of a single-turn completion. Orchestration cost: the compute cost of running the orchestration framework, managing state, and executing the tool integrations. For most agents, this is dominated by the cost of storing and retrieving context; for agents with long task horizons, memory costs can equal or exceed inference costs. Evaluation cost: the cost of running the evals harness on each deployment, which is often overlooked as a production cost but becomes significant at scale. Governance overhead: the human time spent reviewing edge cases, managing incidents, and maintaining the policy stack — the cost that scales with the complexity of the agent estate, not with the volume of individual agent actions.
McKinsey's 2024 State of AI report found that infrastructure and operational costs were the primary barrier to AI scaling for organisations that had moved beyond the pilot phase. This is the point at which inference cost optimisation — model distillation, prompt caching, batching — becomes a material business decision rather than an engineering preference.
Gross margin impact
The gross margin impact of an agent programme depends critically on which cost line the agent replaces or augments. An agent that replaces a portion of a variable cost line — for example, a customer service agent that handles tier-one queries that would otherwise be handled by an outsourced contact centre at a fixed per-interaction fee — produces a straightforward unit-economics calculation: cost per agent action versus cost per human action, multiplied by volume. If the agent action costs $0.12 and the human action costs $4.50, and the agent can handle 70% of the query volume at acceptable quality, the gross margin impact is calculable.
The calculation is less straightforward when the agent augments rather than replaces — when it makes a human faster rather than substituting for the human. Here, the gross margin impact is a function of the productivity multiplier: how much more work can the augmented human do in the same time? A research analyst who uses an agent to complete the first draft of a market analysis in two hours instead of twelve can now do six analyses a week instead of one. The gross margin impact depends on whether the firm can deploy those additional analyses in revenue-generating ways, which is a business model question, not a technology question.
"The agent is not the margin expansion. The agent is the option on margin expansion. Exercising the option requires rethinking the business model around the new capacity." — synthesised from multiple enterprise AI economics discussions.
Second-order effects
The unit economics model captures the direct cost and benefit of the agent. It does not capture the second-order effects, which are often larger. The most important second-order effects are: quality drift — as the agent handles more volume, the distribution of inputs shifts, and the agent may be handling a higher proportion of edge cases than its evals were designed for; dependency concentration — as more business processes depend on the agent, a model provider outage or a model version regression produces business impact that was not in the original risk model; labour market effects — as the agent handles more of the baseline work, the humans in the loop become more focused on edge cases, which over time changes the skill mix required and the career path available; and regulatory exposure — as the agent estate grows, the regulatory surface grows with it, and a programme that was immaterial at five agents becomes material at fifty.
These second-order effects are the reason the Gartner TRiSM framework includes a continuous monitoring function rather than treating AI risk as a one-time assessment. The economics that justified the programme at the start of year one may not justify the programme's current scale at the start of year two, if the second-order effects have materialised in ways the original model did not anticipate.
Cost governance for the agent estate
An agent estate without cost governance will produce surprises. Inference costs in particular are difficult to predict, because they depend on the distribution of input complexity, which is only known empirically from production data. The cost governance discipline for an agentic AI programme mirrors the cost governance discipline for a cloud computing estate: tagging every agent action to a cost centre, setting budget alerts, establishing per-agent cost budgets that trigger a review if exceeded, and building the inference cost per action into the unit-economics model for every use case.
The cost governance owner — typically the AI programme manager in collaboration with the finance team — should produce a monthly cost and benefit report for the agent estate, structured by use case, that shows the unit economics for each active agent, the trend over the preceding three months, and a projection for the next quarter. This report is not a project management artefact; it is a board-level instrument for assessing whether the agent programme is delivering the economics that justified it.