What If Your AI Learned From Every Edit You Make?
Every AI tool I use has the same problem. It doesn’t remember what worked.
I generate a meeting briefing. It’s too long, too generic, starts with “I’d be happy to help you prepare for…” — so I cut half of it, rewrite the intro, add a risk section. The result is good. Next week, same meeting type, I generate again. Same bloated output. Same edits. Same waste of time.
The AI didn’t learn anything from my corrections. And it won’t. Because the feedback loop is broken.
The Missing Loop
Here’s how every AI-powered tool works today:
Prompt → LLM → Output → User edits it → Done.
That last step — “user edits it” — is pure gold. It’s implicit feedback. Every deletion says “this wasn’t useful.” Every addition says “this was missing.” Every rewrite says “this is how I actually want it.”
But nobody captures it. The edits disappear into a Google Doc or a Notion page, and the AI starts from scratch next time.
What if we closed the loop?
Prompt → LLM → Output → User edits → System measures delta
↑ |
├── Extract conventions ←─────────────────┘
├── Select best-of-N examples
└── Inject into next prompt
No fine-tuning. No retraining. No million-dollar GPU bill. Just smart prompt engineering that evolves with every interaction.
How It Works: Three Mechanisms
1. Edit-Distance Tracking
When a user edits an AI-generated output, the system calculates the edit distance between the original and the final version. A score of 0.0 means “perfect, no changes.” A score of 0.8 means “rewrote 80% of it.”
Over time, patterns emerge:
- Section “Introduction” has a 20% survival rate → remove it from the template
- Section “Next Steps” has a 95% survival rate → always include it
- Average output length gets cut by 50% → tell the LLM to be shorter
This is what I call the Template Genome — a data structure that tracks which parts of a generated output survive user editing. Sections with low survival rates get deprioritized. Sections that are always added manually get included by default. The template evolves.
2. Convention Extraction
Every time a user corrects a pattern, the system extracts a rule:
- User always deletes opening pleasantries → Convention: “Start directly with the content. No preamble.”
- User always adds a budget section to project reports → Convention: “Include budget/cost implications.”
- User changes “utilize” to “use” every time → Convention: “Use simple language.”
These conventions are stored in a database — per user, per team, per output type. Before the next generation, relevant conventions are injected into the system prompt:
## Conventions (follow these rules):
- Maximum 400 words. No filler.
- Always end with a “Next Steps” section.
- Start with KPIs and numbers, then details.
- No bullet lists longer than 5 items.
The LLM sees these rules and follows them. No fine-tuning needed. The prompt itself evolves.
3. Few-Shot Selection
The system keeps track of which outputs got the highest ratings (explicit thumbs up) or the lowest edit distance (implicit quality signal). When generating a new output of the same type, it selects the best 2-3 previous outputs as few-shot examples:
Here are examples of meeting briefings that were rated highly:
[Example 1: Customer meeting, compact, with negotiation strategy]
[Example 2: Quarterly review, KPI-focused, with recommendations]
Now generate a briefing for: [new meeting]
This is radically more effective than generic few-shot examples from a training set. These examples come from the user’s own work, reflect their preferences, and are proven to be good — because the user said so.
Why This Doesn’t Exist Yet
The individual pieces exist. Edit distance is a solved problem (Levenshtein, SequenceMatcher). Few-shot learning is standard. Convention databases are just CRUD operations. The LLM providers already support system prompts with rules.
Researchers have explored evolutionary algorithms for prompt optimization (EvoPrompt, ICLR 2024) and preference-driven refinement. But these approaches optimize for benchmark scores or require explicit feedback collection. What’s missing is the implicit loop — learning from the edits users already make, without asking them to rate anything.
What doesn’t exist is the closed loop on the application level.
- GitHub Copilot generates code but doesn’t learn from your code reviews.
- ChatGPT generates text but doesn’t learn from your edits.
- Notion AI creates summaries but doesn’t track what you change.
- SonarQube finds problems but doesn’t improve generation.
Every tool does one half. Generate, or evaluate. Nobody connects the output to the input in a feedback loop that improves the prompt.
The reason is architectural: most AI tools treat the LLM as a black box. Input goes in, output comes out. But between “output” and “user is satisfied” there’s an entire optimization space that nobody is mining.
The Three Learning Levels
This isn’t just about one user getting better outputs. The system can learn at three levels:
Individual: What works for this specific user? Thomas shortens briefings to 50%. Sarah always adds risk sections. Marcus prefers bullet points over prose.
Team / Organization: What works for this company? Engineering reports always need architecture diagrams. Sales briefings always need competitor comparison. All outputs should reference internal data sources.
Cross-Type: What works for this output type? The best project status reports across all users have: KPIs first, risks second, next steps third. The best meeting briefings have: context, open items, conversation strategy.
Level 1 is personalization. Level 2 is organizational knowledge. Level 3 is emergent best practices that no one designed — they evolved from collective usage.
What This Enables: The Proactive AI
Once the system learns what works, it doesn’t need to wait for prompts. It can act proactively:
- Monday morning: “Here’s your week briefing. 3 meetings, 2 overdue tasks, 1 new competitive finding.”
- Before a meeting: “Dossier for your 2pm call with Client X. Last emails, pipeline status, open proposals.”
- Friday: “Week review. 5 tasks completed, 2 new leads, content performance update.”
The AI isn’t just responding. It’s anticipating. And because it learned from months of feedback what you actually find useful, these proactive reports aren’t generic spam — they’re tailored, evolved, and relevant.
The Economics
Fine-tuning a model costs thousands of dollars and takes weeks. The result is a static model that can’t adapt to changing preferences.
Evolutionary Prompt Optimization costs nothing beyond normal API usage. Convention injection adds ~500 tokens to the system prompt. Few-shot examples add ~1000 tokens. At current pricing, that’s less than $0.01 per generation.
And it adapts in real-time. Change your preferences today, see better output tomorrow. No retraining, no deployment, no waiting.
Try This Yourself
You don’t need a sophisticated system to start. Here’s the minimum viable version:
- Track your edits. Next time you rewrite an AI output, note what you changed. After 10 edits, you’ll see patterns.
- Write your conventions. Turn those patterns into rules. “Always start with data.” “Never use more than 300 words.” “Include a recommendation, not just analysis.”
- Add them to your prompt. Put your conventions in the system prompt or custom instructions. You’ll see immediate improvement.
- Save your best outputs. When an AI output needs zero edits, save it as a template. Use it as a few-shot example next time.
That’s the loop. Manual today, automated tomorrow.
Where This Goes
I believe Evolutionary Prompt Optimization is the missing layer between users and LLMs. Not a new model. Not a new provider. A feedback layer that sits on top of any LLM and makes it better over time — for each user, each team, each use case.
The first generation of AI tools was about access: give everyone an LLM. The second generation was about integration: connect LLMs to your data. The third generation will be about evolution: AI that learns from how you use it, not from how it was trained.
We’re at the beginning of generation three.
Thomas Koerting is a sales and marketing professional in the business intelligence space. He writes about the intersection of AI, productivity, and how humans and machines learn from each other.