Why AI Falls Apart at 30% — And the System That Fixes It
Why does AI nail the first 30% of a task… then completely fall apart?
I kept hitting this wall until I changed my approach. Not a better prompt. A better system.
The Problem Every Builder Knows
If you’ve used AI for anything beyond trivial tasks, you’ve felt this pain. You describe a complex feature. AI starts strong — scaffolding files, writing functions, making real progress. Then somewhere around the 30% mark, things go sideways.
The context builds up. Requirements get lost in the noise. And by the end, you’re left with:
- Quietly dropped work that AI never mentioned
- Gaps it doesn’t even notice
- A confident “I’ve completed the task” when it clearly hasn’t
Sound familiar?
The root cause is straightforward: complicated tasks need detailed plans, but as AI works through them, context accumulates. It gets overwhelmed. Important details from your original requirements fade into the background while recent code changes dominate its attention.
The Insight That Changed Everything
Here’s what I realized: AI doesn’t need to do it perfectly in one shot.
It needs to do the same task repeatedly, with a fixed spec to guide it, and a way to validate completion.
Three words: Spec-driven. Iterative. Verifiable.
Why does repetition help? Because the spec stays the same, but the codebase changes.
On attempt two, AI sees what it built in attempt one. It reads the files. It spots gaps. It fixes bugs. It adds what’s missing. Each pass gets closer to the spec. The system becomes self-correcting.
This flips the typical AI workflow on its head. Instead of trying to craft the perfect prompt that gets everything right the first time, you design a system where “good enough, repeatedly” converges on “actually complete.”
The Three Pillars
The method rests on three pillars. Miss any one of them, and it falls apart.
Pillar 1: The Spec
Everything starts with a detailed plan that AI can reference on every iteration.
Start a chat with your AI coding tool. Describe your requirements in detail — what you’re building, why, how features should interact, what the edge cases are. Go back and forth until AI truly understands what you want.
Then save the plan to a file: ./task-plan.md
This becomes your source of truth. Every iteration references this file, not the fading memory of earlier conversation turns.
But here’s the thing: your first spec won’t be perfect. That’s fine.
Before you start implementation, ask AI to review the plan critically:
- Is the architecture sound?
- Are features decoupled from each other?
- Are there thorough test cases for each requirement?
- Is every requirement specific and verifiable?
Iterate on the plan before you iterate on the code. Time spent here pays dividends later. A vague spec produces vague results, no matter how many iterations you run.
Pillar 2: Repetition
Once your spec is solid, the execution loop is simple.
Give AI the same instruction every time: “Implement everything in ./task-plan.md. Check off completed items. Stop when all tests pass.”
First pass: maybe 20% done. AI scaffolds the basics, gets some features working, then runs out of steam or context.
Second pass: it reads the codebase, sees what exists, and picks up where it left off. Another 25% done.
Third pass: it catches bugs from earlier work, fills gaps it missed, adds the edge cases.
Keep going until the spec is fulfilled.
You can do this manually right now with any AI coding tool:
- Run AI with your spec
- When it stops, clear the context
- Run the exact same prompt again
- AI reads the updated codebase and continues the work
- Repeat until the spec is fulfilled
No special tools needed. Just discipline.
Pillar 3: Validation
Validation is non-negotiable. Without it, how do you know when to stop?
If “done” is subjective, you’ll either quit too early or loop forever. You need clear, automated criteria that prove completion.
Build test requirements directly into your spec:
- Unit tests for each feature
- Integration tests for key workflows
- Linting and type checking passing
- “Done” = all tests green
When tests are part of the spec, AI can verify its own work. Each iteration, it runs the tests, sees what’s failing, and knows exactly what still needs attention. The feedback loop tightens. Progress becomes measurable.
When This Works Best
This approach isn’t for everything. It shines when you have:
- Large refactors across many files — migrating from one framework to another, updating API patterns codebase-wide
- Codebase-wide migrations — converting JavaScript to TypeScript, updating deprecated dependencies
- Feature implementation with clear test criteria — building a checkout flow where “done” means all tests pass
- Mechanical, repetitive changes — adding logging to every endpoint, standardizing error handling
The common thread: well-defined scope plus automated verification. If you can describe exactly what “done” looks like and a machine can check it, this method works.
What This Isn’t
A word of caution: this is NOT “set it and forget it.”
You still need to:
- Have crystal clear requirements upfront
- Write a spec specific enough to follow
- Review the plan before implementation starts
- Check the output when it’s done
This method amplifies your clarity. If your spec is garbage, you’ll get garbage output — just more of it, faster. The loop can’t fix unclear thinking. It can only execute on what you’ve defined.
The human judgment happens at the bookends: designing the spec and reviewing the result. The repetition handles the mechanical grind in between.
Automating the Loop: /ralph-loop
Everything above works manually. But if you’re using Claude Code, there’s a shortcut.
Claude Code has a plugin called /ralph-loop, based on Geoffrey Huntley’s “Ralph Wiggum” technique. It automates the repetition part.
The plugin intercepts Claude’s exit attempts, feeds the same prompt back in, and loops until your completion criteria are met. You define success, set an iteration limit, and let it run.
Here’s what it looks like in practice:
/ralph-loop "Implement everything in ./task-plan.md. Output <promise>COMPLETE</promise> when all tests pass." --max-iterations 30
The --max-iterations flag controls cost — autonomous loops burn through tokens quickly, so you want a ceiling. The <promise>COMPLETE</promise> tag tells the loop when to stop. When Claude outputs that exact string, the plugin knows the task is done.
Same concept as the manual approach. Automated execution.
The Mental Model Matters More Than the Tool
The /ralph-loop plugin is convenient, but the tool isn’t the point. The mental model is.
Spec-driven: Write a detailed, verifiable plan before you start coding.
Iterative: Accept that complex tasks take multiple passes. Design for it.
Verifiable: Define “done” in terms a machine can check. Tests, linting, type checking — something objective.
Invest time in a solid plan. Let repetition handle the grind. Let tests prove completion.
AI doesn’t need to be perfect on the first try. It needs to keep going until the spec is fulfilled. Build your workflow around that reality, and the 30% wall stops being a problem.
Comments