The unexpected finding: the model discovered Occam's Razor on its own.
Starting accuracy: 51.3% (zero-shot baseline) After learning: 78.0% (+26.7 percentage points)
But the numbers don't tell the full story. The learning journals reveal something profound:
Phase 1: The model hallucinated complex solutions ("use interval trees!", "apply graph theory!"). Accuracy stayed low (~35%).
Phase 2: Journal entries started showing doubt: "Since the problem is straightforward, focusing on basic interval checking..."
Phase 3: The breakthrough - the model wrote: "This suggests a fundamental misunderstanding of how to handle overlapping intervals."
It admitted it was wrong. From that moment, everything changed.
The distillation process acts as evolutionary selection: simple ideas that work survive, complex ideas that fail get filtered out.
Key advantages: - Fully interpretable (read the complete thought process) - Runs on consumer hardware (no GPU training) - Strategies are transferable text documents - Models learn to doubt themselves (AI safety implication)
All code and papers are open source. The experiment takes ~40 minutes to reproduce on a laptop.
Happy to answer questions about the approach, results, or implementation!