Five rounds on a MySQL database (29 tables, 590MB). Key results:
- Five-Gate rejection rate: 63.6% — most interactions produce no knowledge change - Incremental convergence: +75 → +46 → +12 → +21 → +1 - Gate 2 self-correction: caught and fixed 2 erroneous rules the Skill had written earlier - Round 5: zero exploration steps, direct template reuse - Accuracy: 100% (no incorrect knowledge survived)
Unexpected finding: tool usage pitfalls were captured as a high-value byproduct.
A second experiment on a larger telecom billing database is in progress for cross-domain validation.