I've just published a technical book exploring the "Tongue-Ear Problem" in AI: LLMs can generate plausible text about a tongue touching an ear, but they lack the body to run the experiment to verify it. They have linguistic competence without grounded meaning.
The book tests this philosophy with two real-world experiments:
1. The Algorithm Vortex (Ch 2): Instead of writing solvers, I "hired" agents to invent them. Using a "zero framework" approach (just bash + an evaluator), an autonomous agent discovered a "Diagonal Layering" strategy for circle packing. It matched the state-of-the-art results from DeepMind's AlphaEvolve (2.636 vs 2.635), but without the complex evolutionary framework.
2. System 3 Architecture (Ch 4): To fix the grounding issue in coding, I built an "epistemic" agent with an external verification scaffold (tracking tool reliability and failure memory). I ran it against a baseline on SWE-bench Verified:
- Solve Rate dropped: The baseline solved 50%, my epistemic agent only 40%.
- Code Hygiene skyrocketed: The epistemic agent’s patches were 57% smaller on average (269 lines vs 620 lines).
The agent traded raw capability for surgical focus. It stopped guessing and started checking its "trust stack."
It is free on Leanpub (set slider to $0). I'd love feedback.
hanialshater•1h ago
I've just published a technical book exploring the "Tongue-Ear Problem" in AI: LLMs can generate plausible text about a tongue touching an ear, but they lack the body to run the experiment to verify it. They have linguistic competence without grounded meaning.
The book tests this philosophy with two real-world experiments:
1. The Algorithm Vortex (Ch 2): Instead of writing solvers, I "hired" agents to invent them. Using a "zero framework" approach (just bash + an evaluator), an autonomous agent discovered a "Diagonal Layering" strategy for circle packing. It matched the state-of-the-art results from DeepMind's AlphaEvolve (2.636 vs 2.635), but without the complex evolutionary framework.
2. System 3 Architecture (Ch 4): To fix the grounding issue in coding, I built an "epistemic" agent with an external verification scaffold (tracking tool reliability and failure memory). I ran it against a baseline on SWE-bench Verified: - Solve Rate dropped: The baseline solved 50%, my epistemic agent only 40%. - Code Hygiene skyrocketed: The epistemic agent’s patches were 57% smaller on average (269 lines vs 620 lines).
The agent traded raw capability for surgical focus. It stopped guessing and started checking its "trust stack."
It is free on Leanpub (set slider to $0). I'd love feedback.