"Human-in-the-loop" is the most-used and least-defined phrase in the field. It appears in every vendor deck and almost every regulatory submission. In production, it usually means one of three things — only one of which actually catches anything.
A comforting phrase
The phrase becomes empty when the human in the loop has no real authority to stop the action, no time to evaluate it, no information to evaluate it with, and no consequence for waving it through. We have seen all four of those failures in production. The most common is the third: the human is shown a "summary" of the agent's reasoning and a button labelled "approve" — and the summary is exactly what the agent wants the human to see, generated by the same agent whose work is being approved. That is not oversight. That is collaborative theatre.
The phrase becomes useful when the human is doing one of three specific things: catching a confabulation before it reaches a customer; making a decision the agent is not authorised to make; or creating a record that this particular case was reviewed by a named human, for audit purposes. Those are real functions. Each requires a different design.
Three tiers of human oversight
The three useful patterns of HITL, in increasing order of cost:
Tier 1 — Exception escalation. The agent works autonomously and escalates to a human only when its own confidence is low, when the action it would take is consequential beyond a defined threshold, or when a customer signals dissatisfaction. The human is doing the high-judgement minority of cases. This is the cheapest tier and the right default for high-volume, low-stakes work. Klarna's AI agent handles two-thirds of customer service interactions in this pattern; the remaining third routes to humans specifically because the cases require emotional judgement.
Tier 2 — Post-action sampling. The agent acts, and a sample of its actions is reviewed by humans after the fact. This catches drift and provides audit evidence without slowing throughput. Sampling can be random, risk-weighted (any high-blast-radius action is sampled at 100%), or anomaly-driven (any action flagged by the online metrics from Chapter 10). This tier is the right default for medium-stakes, mostly-reversible work.
Tier 3 — Pre-action approval. The agent stops before acting and a human must approve the action. This is the most expensive tier and the only one that catches a confabulation before it reaches the world. It should be reserved for actions that are irreversible and consequential — refunds above a threshold, communications to external parties, data deletions, financial transactions. Pre-action approval works only when the human has the time, information, and authority to actually say no.
Five tests for a real checkpoint
Before relying on a Tier 3 pre-action approval as a control, run it through five tests. (1) Authority: does the human have the formal power to refuse? (2) Information: does the human see the agent's reasoning, the source data, and the predicted consequence — not a self-summary? (3) Time: does the human have enough time to actually evaluate, given the volume? (4) Independence: is the human's evaluation generated independently of the agent's framing, or is the human being primed by the agent's chosen narrative? (5) Feedback: when the human refuses, does that refusal feed back into the eval suite (Chapter 10) and the Map artefact (Chapter 9)?
If any of those five fails, the checkpoint is theatrical. It will pass the audit and miss the real failures. The most common failure is the third: the volume is too high and the human is granted seconds, not minutes, to evaluate each action. At that point the checkpoint is a rubber stamp.
If your HITL design depends on a human reviewing more than about 50 routine items per hour, the human is not reviewing — they are pattern-matching for the obvious failures and missing the subtle ones. The right response is either to automate the routine majority (Tier 1) or to staff the review function properly. There is no third option that produces real oversight.
Automation bias and the fatigue problem
Automation bias is the well-documented tendency for humans to over-trust the recommendations of automated systems, particularly when the system is usually right. NIST AI 600-1's risk category 7 — Human-AI Configuration — is the formal name for this problem. The mechanism is mundane. The agent is right 95% of the time. The human reviewer learns, after a few hundred reviews, that approving is almost always the right answer. The reviewer's attention drifts. The 5% of cases where the agent is wrong are now a 5% miss rate at the human checkpoint as well.
Two design responses help. The first is forcing function: requiring the human to enter a specific piece of information that proves they engaged with the case (the customer's actual name, the dollar amount, the source document line). The second is anomaly highlighting: instead of asking the human to evaluate every case from scratch, the agent (or a separate model) flags what is unusual about this case relative to the agent's typical behaviour. The reviewer's job becomes pattern-breaking rather than pattern-confirming.
None of this removes automation bias entirely. The honest framing is that pre-action HITL is a partial mitigation, not a guarantee. The other mitigations — eval suites, scoped credentials, sampled review, anomaly monitoring — are still required. HITL is one layer of defence in depth, and the most effective when it is reserved for the smallest set of actions that genuinely need it.
Part II ends here. The artefacts you should now have, at least in draft, are: a Map artefact per agent, an Action-Consequence Map per tool, an eval suite with a golden set, an incident playbook with a tested kill-switch, an agent registry with the minimum fields, a risk catalogue mapped to OWASP and ATLAS, and a HITL design that survives the five tests above. These are the conditions under which Part III's deployment work actually compounds. Without them, the case studies that follow read as cautionary tales rather than templates.