Building an Agentic Enterprise  ·  Chapter 15 of 21
Chapter 15

Klarna: The Most Honest Story on the Public Record

What actually happened — fire, walkback, $60M, and the layered architecture that worked

853FTE
equivalent work the agent did by Nov 2025
$60M/yr
annual savings on Klarna's first earnings call as a public company
5,500→3,000
headcount, while doubling revenue, customer satisfaction maintained
Save PDF

Klarna's agent is the most-discussed enterprise AI deployment of the last two years and the most consistently misread. The headline arc — fire, hire back, walk back — is wrong. The actual arc is more interesting and more useful as a template.

What they shipped

In February 2024, Klarna deployed an OpenAI-powered AI assistant for customer service. Within its first month, the assistant handled 2.3 million conversations — two-thirds of all customer service chats — doing the equivalent work of 700 full-time agents across 35 languages in 23 markets. Resolution time dropped from 11 minutes to under two. Repeat inquiries fell 25%. Customer satisfaction was on par with human agents. The company projected a $40M profit improvement for 2024.

The press received it as a victory and a warning at the same time. The implicit story — "Klarna replaced 700 humans with one AI" — became a fixture of every subsequent vendor pitch and every think-piece on AI displacement. It was a useful headline. It was also incomplete in ways that mattered.

The partial walkback

In May 2025, CEO Sebastian Siemiatkowski told Bloomberg that "cost was a too predominant evaluation factor" and that the AI-first strategy "resulted in lower quality." Klarna announced it was hiring human agents again, describing an "Uber-type" flexible remote model. The press received this as a reversal. The headline became "Klarna fires its AI and rehires humans" — a tidy narrative arc with a moral about the limits of automation.

That narrative was wrong. It read the May 2025 Bloomberg comment as a retreat when it was a calibration. Klarna did not turn off the agent. It added a small human tier for emotional and complex interactions, while keeping the agent doing the work it had always done well.

The full story

By November 2025, on Klarna's first earnings call as a public company, the picture became visible. The AI assistant was doing the work of 853 full-time agents — up from 700 — saving roughly $60M annually. Klarna had cut overall headcount from about 5,500 to under 3,000. Revenue had doubled. Customer satisfaction was still on par with humans. A careful reading of the public statements shows the architecture was right; the operational design — full replacement, no human tier — was the part that needed adjustment.

So what was the "rehire"? The added human tier was small, scoped to complex and emotionally charged interactions, and structured as flexible/remote work rather than a return to the prior contact-centre model. The total economic impact compared to the pre-agent baseline remained dramatically positive. The story is not "AI is not ready." The story is "full replacement is not the right operational design, even when the agent works."

A practitioner's note

The single most expensive decision in the Klarna story was not technical. It was the decision to deploy as a full replacement rather than a layered system. The technology made the decision tempting; the calibration cost was the price of taking the temptation. Read this carefully: layered AI + human is not a compromise. It is the architecture that compounds.

The template, read carefully

What the careful reader takes from Klarna is not "AI replaces humans" or "AI fails." It is something more useful: a layered design works, and the layers belong in this order. The agent handles the high-volume, repeatable, well-scoped majority of interactions. A small human tier handles the emotionally complex minority and the cases the agent's confidence flags as low. The agent's eval suite ingests the human tier's resolutions as new golden-set candidates. The flywheel runs.

The metrics worth borrowing from Klarna: containment rate (percent fully resolved by the agent without escalation, Klarna at roughly two-thirds), resolution time (Klarna's 82% reduction is the right order of magnitude for a successful CX deployment), repeat-inquiry rate (the agent should not be making the customer come back), and customer satisfaction parity (a non-negotiable floor — if the agent's CSAT is materially lower than human's, the architecture is wrong, not the operational tuning).

The numbers worth treating with caution: the per-agent FTE-equivalent figure is Klarna's framing of throughput, not a portable productivity metric. Other companies' agents will produce different numbers because their interactions, languages, and authority scopes differ. The portable lessons are architectural, not numerical.

The next chapter is the cautionary case — JPMorgan — which is also a success story, told as a lesson in patience.

Klarna's actual arc Deploy → calibration → public reveal. The architecture remained throughout; the operational design adjusted at one point. Feb 2024 deploy 700 FTE-equiv 2.3M chats / month 35 languages · 23 markets 11 → <2 min resolution $40M projected profit lift May 2025 calibration add human tier "cost weighed too heavily" flexible / remote model scoped to complex / empathic agent retained throughout Nov 2025 public reveal 853 FTE · $60M/yr first earnings call 5,500 → <3,000 headcount revenue doubled CSAT on par with humans Press read May 2025 as a reversal. Earnings revealed it was a calibration on a working architecture.
Figure 15.1Klarna timeline: deploy (Feb 2024), public walkback (May 2025), public earnings reveal (Nov 2025). The architecture remained throughout; the operational design adjusted at one point.