Show HN: SAIA – SCUMM for AI Agents

https://github.com/serendip-ml/llm-saia/blob/main/docs/INTRO.md

1•serendip-ml•1h ago

Comments

serendip-ml•1h ago

Hi HN, Karpathy's recent post [1] described Claws as "a new layer on top of LLM agents, taking orchestration, scheduling, context, tool calls to a next level." That's the right framing - but orchestration alone isn't enough. SAIA is the rails layer that makes that orchestration predictable.

[1] https://simonwillison.net/2026/Feb/21/claws/

Instead of prompting an LLM and hoping it does what you meant, the idea is to write in 12 verbs (ASK, VERIFY, CRITIQUE, REFINE, etc.) with typed outputs - each verb returns a dataclass, enforced by JSON schema at the API level. The name comes from SCUMM - the scripting language LucasArts used for Monkey Island. Constrained vocabulary, structured outputs, debuggable behavior.

The bigger goal: agents that actually improve over time. What I've learned building these is that without training, agents plateau quickly. They can remember facts, but they don't get better at their job. So feedback from execution flows into fine-tuning, and the model gets better at the specific task. Not "memory," but real learning.

For that to work, I needed to build multiple layers: - *llm-saia*: the protocol layer (this post) - rails between Python and LLM - *llm-infer*: inference server (vLLM, LoRA support) - *llm-kelt*: feedback collection → fine-tuning pipeline - *llm-gent*: agent runtime with traits, tools, persistence - *appinfra*: production Python infrastructure that holds it all together

Everything is open source. Happy to discuss design tradeoffs - the 12-verb constraint is intentionally limiting.

This is v0 - the vocabulary will evolve. If there's prior work I should know about, drop a link.

Open problems worth solving: - *Determinism*: same input → same output. Current idea: fine-tune models to follow verb contracts reliably. - *Verification*: how do you prove a verb did what it claimed? Tracing helps, but formal guarantees need real PL exper - *Composition*: when verbs chain, errors compound. Better error propagation and recovery needed.

Show HN: Rev-dep – 20x faster knip.dev alternative build in Go

Show HN: Smplogs – Local-first AWS Cloudwatch log analyzer via WASM

Fries with that? Ordering from AI linked to selecting more indulgent foods

Tunnelling Torrents 'Properly' over a VPN with Port Forwarding

Python React to Elixir Phoenix Migration Breakdown

I made my agents joke with each other [video]

OpenJS Foundation: A safe and modern home for JavaScript technologies

Draining wetlands produces substantial emissions in the Canadian Prairies

Feather – Draw in 3D

Show HN: Turning 2D floor plans into 3D-ready JSON with Detectron2

Explain to Issue Reporter

Brave Search API now features Place Search, a new endpoint for map applications

Launch HN: Cardboard (YC W26) – Agentic video editor

We Built a Video Rendering Engine by Lying to the Browser About What Time It Is

OsmAnd's Faster Offline Navigation

AirSnitch: Demystifying and Breaking Client Isolation in Wi-Fi Networks

People Leaving US

My accepted research work on 'Failure-Aware Security Framework'

Bring Your Own Agent (BYOA)

Attacking Russia's Center of Gravity: A Clausewitzian Answer

Human Made: The Pledge

GitHub Actions is left vulnerable to supply chain attacks: Datadog Report

How Google Killed the Rent-a-Domain Era

Show HN: Karta – Google Search, for discovering talent

Smallest transformer that can add two 10-digit numbers

A Visual Guide to DNA Sequencing

He saw an abandoned trailer. Then, uncovered a surveillance network

Show HN: I built a local AI-powered Ouija board with a fine-tuned 3B model

Using AI without losing skills

Hyper: a reactive server side rendered web framework for Clojure