The idea: agents can improve just by reflecting on their own execution traces.
How it works: Agent runs → reflects on what worked/failed → curates strategies into a "playbook" → injects playbook on next run.
No fine-tuning, no training data.
Results on browser-use: 30% → 100% success rate, 82% fewer steps, 65% token savings.
Also works with local models and really helps them punch above their weight to match closed-source models.