Hi HN, Karpathy's recent post [1] described Claws as "a new layer on top of LLM agents, taking orchestration, scheduling, context, tool calls to a next level." That's the right framing - but orchestration alone isn't enough. SAIA is the rails layer that makes that orchestration predictable.
Instead of prompting an LLM and hoping it does what you meant, the idea is to write in 12 verbs (ASK, VERIFY, CRITIQUE, REFINE, etc.) with typed outputs - each verb returns a dataclass, enforced by JSON schema at the API level. The name comes from SCUMM - the scripting language LucasArts used for Monkey Island. Constrained vocabulary, structured outputs, debuggable behavior.
The bigger goal: agents that actually improve over time. What I've learned building these is that without training, agents plateau quickly. They can remember facts, but they don't get better at their job. So feedback from execution flows into fine-tuning, and the model gets better at the specific task. Not "memory," but real learning.
For that to work, I needed to build multiple layers:
- *llm-saia*: the protocol layer (this post) - rails between Python and LLM
- *llm-infer*: inference server (vLLM, LoRA support)
- *llm-kelt*: feedback collection → fine-tuning pipeline
- *llm-gent*: agent runtime with traits, tools, persistence
- *appinfra*: production Python infrastructure that holds it all together
Everything is open source. Happy to discuss design tradeoffs - the 12-verb constraint is intentionally limiting.
This is v0 - the vocabulary will evolve. If there's prior work I should know about, drop a link.
Open problems worth solving:
- *Determinism*: same input → same output. Current idea: fine-tune models to follow verb contracts reliably.
- *Verification*: how do you prove a verb did what it claimed? Tracing helps, but formal guarantees need real PL exper
- *Composition*: when verbs chain, errors compound. Better error propagation and recovery needed.
serendip-ml•1h ago
[1] https://simonwillison.net/2026/Feb/21/claws/
Instead of prompting an LLM and hoping it does what you meant, the idea is to write in 12 verbs (ASK, VERIFY, CRITIQUE, REFINE, etc.) with typed outputs - each verb returns a dataclass, enforced by JSON schema at the API level. The name comes from SCUMM - the scripting language LucasArts used for Monkey Island. Constrained vocabulary, structured outputs, debuggable behavior.
The bigger goal: agents that actually improve over time. What I've learned building these is that without training, agents plateau quickly. They can remember facts, but they don't get better at their job. So feedback from execution flows into fine-tuning, and the model gets better at the specific task. Not "memory," but real learning.
For that to work, I needed to build multiple layers: - *llm-saia*: the protocol layer (this post) - rails between Python and LLM - *llm-infer*: inference server (vLLM, LoRA support) - *llm-kelt*: feedback collection → fine-tuning pipeline - *llm-gent*: agent runtime with traits, tools, persistence - *appinfra*: production Python infrastructure that holds it all together
Everything is open source. Happy to discuss design tradeoffs - the 12-verb constraint is intentionally limiting.
This is v0 - the vocabulary will evolve. If there's prior work I should know about, drop a link.
Open problems worth solving: - *Determinism*: same input → same output. Current idea: fine-tune models to follow verb contracts reliably. - *Verification*: how do you prove a verb did what it claimed? Tracing helps, but formal guarantees need real PL exper - *Composition*: when verbs chain, errors compound. Better error propagation and recovery needed.