Starting to run local inference has highlighted something I've been aware for longer: just running tests output shedloads of text into the context window that is there for good until compaction or starting afresh. For example, a single `cargo test` dumping 8KB into the agent's context just to communicate "47 test passed." The agent reads all of it, learns nothing useful, and the context window fills with noise. Makes LLM prefill slower as well as costs more when using per token APIs.
I created a small program that sits between the command output and the LLM: oo, or double-o ... yes, sad play on words. Double-o, the agent's best friend :)
oo wraps commands and classifies their output:
- Small output (<4KB): passes through unchanged
- Known success pattern: one-line summary
(oo cargo test → cargo test (47 passed, 2.1s))
- Failure: filtered to actionable errors
- Large unknown output: indexed locally, queryable via oo recall
It currently ships with 10 built-in patterns (pytest, cargo test, go test, jest, eslint, ruff, cargo build, cargo clippy, go build, tsc), but users can add their own via TOML files or use oo learn <cmd> to have an LLM generate one from real command output (currently only with Anthropic models).No agent modification needed: add "prefix commands with oo" to your system prompt. Single Rust binary, 197 tests, Apache-2.0.
The classification engine works using regex-based pattern matching with per-command failure strategies (tail, head, grep, between) and automatic command categorization (status/content/data/unknown) that determines what happens with unrecognized commands. Content commands like git diff always pass through; data commands like git log get indexed when large.
Especially noticeable with local models & wall-clock time. Helps with frontier models too ... cleaner context, fewer confused follow-ups.