I've been developing a practice I call OMLCP (Output-Maximizing
Long-Context Programming), which exploits modern models' large
output windows instead of optimizing for frequent small interactions.
The core argument: agentic workflows made sense when max output was
4k tokens. Frontier models now support 64k-128k output tokens, but
most tooling still optimizes for short responses. The structural
inefficiency compounds quadratically with project size.
The paper includes:
- Formal token economics model with sensitivity analysis (13-17x
advantage at 10k lines, 40-45x at 30k lines)
- Three field deployments with real cost figures ($0.58 for 14,431 lines)
- Honest failure cases including an SSE parsing incident that required
full re-stream
- A reproducibility protocol for independent validation
- Capability tier framework to make claims model-version-agnostic
I'm a business analyst in South Africa. No lab affiliation, no team.
Happy to answer questions about methodology or the streaming
infrastructure.
JasonViviers•1h ago
The core argument: agentic workflows made sense when max output was 4k tokens. Frontier models now support 64k-128k output tokens, but most tooling still optimizes for short responses. The structural inefficiency compounds quadratically with project size.
The paper includes: - Formal token economics model with sensitivity analysis (13-17x advantage at 10k lines, 40-45x at 30k lines) - Three field deployments with real cost figures ($0.58 for 14,431 lines) - Honest failure cases including an SSE parsing incident that required full re-stream - A reproducibility protocol for independent validation - Capability tier framework to make claims model-version-agnostic
I'm a business analyst in South Africa. No lab affiliation, no team. Happy to answer questions about methodology or the streaming infrastructure.