Most SMB AI pilots fail. The published failure rates I've seen put it at 60 to 80% [verify], which is roughly consistent with what I've watched in the field. But the failure rate isn't the interesting part. The interesting part is that the failures are predictable and most are preventable.
This post is the failure taxonomy, ranked by how often I see each pattern, with the three things that consistently save the projects that survive. The general pillar on why AI projects fail covers the broader frame. This is the pilot-specific version.
After enough engagements, the failures cluster into a small set. Here's the ranked list of the eight most common, with the relative frequency I see for each.
By far the most common. The pilot has a build owner during development. The pilot has a sponsor on the business side. The pilot doesn't have a named person who'll spend 5 to 10 hours a week operating the agent after handoff.
The agent ships. It works for two weeks. It drifts in week 4 because the model behavior shifts subtly or the data distribution changes. Nobody notices because nobody owns noticing. By week 8, the agent is unreliable. By week 12, the team has stopped using it.
The cause is almost always that the owner question wasn't asked in the kickoff. The owner is "the team", "we'll figure it out", or "the vendor will support us". None of these is a real answer.
Saves rate: about 70% of these failures can be saved by naming an owner late and giving them the time. The 30% that can't be saved usually involve agents that have already lost team trust beyond repair.
The owner saw the demos, got excited, and scoped a build that touches three systems, has a real-time customer-facing component, and depends on data the team doesn't currently have clean. The build runs over budget, gets descoped halfway through to ship something, and what ships is the worst version of what was originally proposed.
The pattern is consistent: ambition at the start, descope under deadline pressure, ship something that nobody likes.
The fix is upstream: scope the first build to the boring 60-day version, not the ambitious 6-month version. I've covered this in where to start with AI. The first build needs to be a clean win on the board, not a heroic one.
The agent works fine in isolation. Hooking it into the actual production systems takes 3 to 5 times longer than estimated. The team runs out of budget, the integration is half-finished, and the agent ends up reduced to a manual copy-paste step in someone's workflow.
I've covered the cost-breakdown view of this in cost to build a custom AI agent. Integration is the line item most often under-priced.
Saves rate: lowest of the eight patterns. Once integration costs spiral, it's very hard to recover the project. Usually the right call is to stop, redesign with a clearer integration architecture, and accept that the original build was a learning exercise.
The success criteria were "improve customer satisfaction" or "save the team time" or "increase efficiency". After the build ships, nobody can tell whether it's working. The team debates for months whether to keep going.
This pattern doesn't kill the project outright. It kills the project's funding round 18 months in when leadership reviews the AI investments and can't find clear wins.
The fix is two specific numeric metrics, picked before kickoff, signed off by the operating owner. Quality metric and efficiency metric. Anything more is smuggled vagueness.
The agent ships without a real eval harness. The team eyeballs a few outputs, decides they look reasonable, pushes to production. Three weeks later, the agent does something embarrassing on a real customer-impacting case. Trust drops. The team can't tell whether the agent is getting better or worse over time because there's no metric to track.
This pattern is more about a slow death than a fast failure. The agent doesn't get killed; it just stops being trusted. Eventually it gets retired without anyone formally calling it.
Recovery is possible: build the eval harness late, run it against historical data, establish a baseline. Sometimes you find the agent is actually performing well and team perception is the problem; sometimes you confirm it's degraded and the build needs revisiting.
The first build was customer-facing real-time chat. It failed in a visible way in week 2. The team retreated from AI for a year.
This is a sub-pattern of failure 2 (over-ambitious scope) but worth calling out specifically because the customer-facing first build is unusually common as a misstep, and the brand damage is unusually high.
The fix is upstream and structural: don't let the first build be customer-facing. Internal-first is the rule.
The agent shipped using a specific model from a specific vendor. Six months in, the vendor deprecated the model. The team had to scramble to swap to a successor, the swap exposed prompt-engineering quirks that were specific to the old model, the agent's behavior changed in subtle ways, the team lost confidence.
Newer failure mode that's getting more common. Mitigation is to design the agent so model swaps are routine, not surprises. That means model abstraction in the code, an eval harness that you can re-run on a candidate model, and a budget line item for model swap work every 12 to 18 months.
The agent's outputs were technically correct but disagreed with the team's intuition about cases the team felt confident about. The team didn't trust the agent and continued doing the work manually as a check. Within a few months they stopped using the agent because the check work negated the time savings.
Trickier failure mode. The fix is in the eval design: the agent has to demonstrate competence on cases the team knows well before they'll trust it on cases they don't. This means a soft launch with explicit human review for the first month, not a hard cutover.
Across the 20 to 30% of SMB AI pilots that ship successfully and stay in production, three things show up consistently.
The owner is one specific person. Their manager has signed off on the time budget. The owner spends 5 to 10 hours a week in the first quarter watching outputs, tuning prompts, and reviewing edge cases. After the first quarter, this drops to 1 to 2 hours a week.
The owner is usually the line manager of the workflow being augmented, not the CTO and not a vendor. The line manager has the context to know when the agent is right and when it's wrong.
The first build is unglamorous. One input source, one output target, asynchronous, internal, with human review. The kind of build that pays back in 8 to 12 months at $20,000 to $35,000 budget.
The team treats the first build as a learning project, not a transformation. They write down what they learn so the second build is faster.
The team builds a fixed eval set during scoping. They run it on the agent before launch. They run it after every prompt change. They track the metric over time. They alert when it moves.
The eval harness is unglamorous and consumes budget the vendor would have spent on more features. Worth it every time. The pilots without an eval harness can't tell whether they're succeeding; the pilots with one know in real time.
These three saves are not technical. They're project discipline. The pilots that fail aren't failing because the technology doesn't work. They're failing because the project wasn't run as a project. I've covered this in the broader pillar on AI project failure and in how to de-risk your first AI project; the patterns repeat because the failures repeat.
A clear set of conditions where killing a failing pilot is the right call, not a desperate one.
The owner question can't be answered. After 4 to 6 weeks of trying, no one in the business has the capacity to own operating the agent. Killing now saves the operating cost.
The scope keeps growing. The original 60-day build is in month 4 and the team is still finding "one more integration we need". The build has lost coherence. Better to stop, draw a tight box around what's done, and ship that as a smaller pilot.
The eval metric is below the threshold and not improving. After three rounds of prompt tuning the quality is still below the bar the team set. The model isn't right for the task or the workflow is not AI-shaped. Cut losses.
Trust has been lost beyond recovery. The team is actively working around the agent. The damage is done; nothing the project does will rebuild that trust. Start a different project with a different framing.
The honest version of "killing a pilot" usually saves the AI program for the SMB. Letting bad pilots limp along consumes the budget that the next, better-scoped project needs.
Killing well is also a leadership move. I've written about the broader failed product lessons in the consumer context; the same instincts apply here. Stopping at the right time is usually the unsung correct decision.
If you're scoping your first AI pilot, run through this list before kickoff.
Have you named a specific operating owner? If no, fix it before signing.
Is the scope a 60 to 90 day boring win? If no, descope.
Will the build include an eval harness with deliverables? If no, add it to the budget.
Are the success metrics two numeric ones, signed by the operating owner? If no, write them down.
Have you scoped integration at 30 to 40% of the total budget? If no, the integration is under-priced.
Five questions. Three of them map to the three saves above. The other two are upstream prevention for the most common failure modes. If you can answer yes to all five before the build starts, you're in the small group that ships successful AI pilots. The math on that group is good. The math on the rest is not.
RELATED READING
I've been on the receiving end of more failed AI projects than I'd like to admit. Some I ran. Some I inherited and tried to save. Some I was asked to do a post-mortem on after the original team had…
Vendor quotes for SMB AI projects routinely miss 20 to 50% of the actual cost. The missing pieces aren't hidden in the sense of dishonest. They're missing because the quote was scoped to what the…
The cheapest way to fix an AI project is before it starts. The most expensive way is after it ships. The honest version of "de-risk your first AI project" is a list of seven things to lock down…
Most small business owners I talk to are stuck in the same place with AI. They've watched ChatGPT do something impressive. They've signed up for two or three SaaS products with "AI" in the name.…
The question "where to start with AI in your business" is asked badly more often than it's asked well. I've heard it phrased as "what AI tool should we buy", "should we hire an AI person", "what's…
As a Product Manager, we don't want our products to fail. Let us analyze some of the biggest technology product failures to see why those products failed.
A healthy product backlog is a necessary starting point for any successful product life cycle. The product backlog is owned and managed by the product team.
FREQUENTLY ASKED