In June 2024, McDonald's terminated its AI drive-thru pilot. The internet treated it as a viral failure. The Museum of Failure took a more interesting view: McDonald's did this right. The chapter is about why, and what to copy.
What happened
In 2021, McDonald's partnered with IBM to test Automated Order Taking (AOT) — AI-powered voice ordering at drive-throughs. The system was deployed at over 100 U.S. restaurants over a nearly three-year trial. In June 2024, McDonald's terminated the IBM AOT partnership and removed the technology from all testing locations by July 26, 2024. The official statement said the company would "explore voice ordering solutions more broadly" and find a new partner.
The pilot's most-shared moments — TikTok videos of the AOT misordering customer requests, adding two thousand chicken nuggets, refusing to stop — were genuinely funny and genuinely embarrassing. They also occurred on a sample of locations small enough that the embarrassment scaled with the meme economy, not with the business.
What the press saw
The press read the rollback as a failure of AI, of IBM, of McDonald's, or of all three. The story fit a familiar template: ambitious technology meets the real world, fails comically, gets pulled. Each retelling reinforced the template. Within weeks, "McDonald's drive-thru AI" became shorthand for AI in general not being ready for the real world.
The reading is wrong in three places. First, AOT failed at a sample, not at the chain. Second, the failure was discovered before it cost the company materially, because of the sample design. Third, the decision to terminate without sunk-cost paralysis is, by itself, evidence of a healthy AI program — most enterprises are unable to kill projects that have run for three years and cost millions.
What actually went right
The Museum of Failure made the point most directly: "McDonald's did this right. They tested at only 100 locations. There are over 40,000 McDonald's locations. If they had installed this expensive tech at all locations at enormous cost only to find out it didn't work, that would have been a failure." Running a controlled experiment at 0.25% of the estate, measuring real-world outcomes, and pulling the plug without sunk-cost bias is exactly how enterprise AI pilots should be run.
The deeper signal is the operational discipline behind the visible decision. To kill a three-year, multi-vendor pilot in a public way requires: a pre-committed go/no-go threshold, a measurement programme that produced credible numbers, an executive owner empowered to terminate, a public-relations posture that did not double down out of pride, and a procurement framework that allowed graceful contract exit. None of those are AI capabilities. All of them are organisational capabilities, and they are the reason the cost of being wrong was bounded.
The most expensive form of "AI failure" is not the failed pilot. It is the pilot that should have failed and was instead allowed to ship. McDonald's chose embarrassment at 0.25% of the estate over operational failure at 100%. That trade is almost always correct. The teams that get it wrong are the ones that cannot stomach a public termination — and so accept a much larger private one.
The pilot template
The template McDonald's accidentally produced for the field is short. Start small enough that termination is bearable. One percent of the estate is generous; 0.1–0.5% is responsible. Define termination criteria in advance, in writing, in numbers, and tie them to a measurement programme that produces those numbers without ambiguity. Empower the executive owner to terminate without having to escalate to the board to do so; an owner who must seek board approval to kill a project will almost always extend it instead. Pre-write the public statement for both outcomes — success and termination — so the communications are not authored under emotional pressure.
Two corollaries. The first: do not pilot what you cannot afford to pull. If your pilot would be too expensive to terminate after a year because of training costs, integration, or political capital, your pilot is a launch in disguise. The second: evaluate pilots against business outcomes (order accuracy, throughput, customer experience) and not against AI quality metrics in isolation. AOT's measured failure was operational; the right metric to terminate on was the operational metric.
The next chapter is the platform map: which no-code, low-code, and code-first platforms exist, what they are good at, and where the lock-in risks bite.