Claude Code took 0.1s, Cursor CLI 19s
idk what you expect from a question about "how much data". its tool based search. its a lot.
One of my side projects is a full text index for pattern search, and I'm trying to understand how it might fit with that. You mention tool call overhead, but is that a significant part of the latency in the multi-turn scenario, or is it the coding agent being forced into a serial processing pattern?
for another take on latency attribution see https://x.com/silasalberti/status/1979310181424206143
you can try the https://playground.cognition.ai/ here
i wrote a longer explainer here https://x.com/swyx/status/1978874342743343254 but saving you the click
this was a perspective cut from the blogpost, but let me explain why subagents kill long context
Like you can spend $500m building 100 million context models, and they would be 1) slow, 2) expensive to use, 3) have huge context rot. O(n) is the lower bound.
Cog's approach is something you learn in day 1 of CS50 - divide and parallelize. Embeddings are too dumb, Agentic Search is too slow. So train limited-agency (max 4 turns), natively parallel tool calling (avg parallelism of 7-8, custom toolset) fast (2800tok/s) subagents to give the performance of Agentic Search under an acceptable "Flow Window" that feels immaterially slower than Embeddings.
The benefit of this is threefold:
- 8 ^ 4 toolcalls cover a very large code search space. can compound subagent calls if more needed.
- predictable cost & end to end latency
- subagent outputs "clean" contexts, free of context failure modes like context poisoning and context rot
we originally called this Rapid Agentic Search, to contrast with RAG. but Fast Context rolls off the tongue better.
-- Second perspective --
The Fundamental Equation of Coding Agents is:
Coding Agent Performance = Ability to Read the Right Files * Ability to Generate the Right Diffs
Fast Context is Cognition's first solution for the Read. As codebases get larger and and tasks get more complex, Reads get more important. the average production codebase first query in Cascade is >60% just searching and reading files.
But if this were just about speed, it might not be that exciting. I think there are unappreciated effects in performance as well when you have very good context. In other words:
Context Engineering is Actually Very Important. Too important for humans and hardcoded rules.
The swe-greps are the first dedicated context engineer agent models.
Most LLM coding is so slow that you're permanently out of flow state, and in 'manager' state right now - I'm interested in a future where you've got enough fast low TTFT support that an engineer could maintain flow state and have sort of super power type productivity at the same time, and this tool makes me think of that.
That is, it looks fast enough to be used as a sort of sidebar info tool, as in "what you're coding might need / refer to these other parts of the codebase" -- effectively increasing an engineer's working memory. Super cool. And obviously useful for an AI engineer as well. Thanks for the writeup!
So that's how that is going ;)
"We ran into an error processing your request. Please try again"
I also enjoyed the tech write-up. It's good to see REAL substantial engineering like this which is both highly impressive and highly productized.
marstall•3mo ago
swyx•3mo ago
marstall•3mo ago