How do context collation companies work?
How do context collation companies work?
If not, inserting new context any place other than at the end will cause cache misses and therefore slow down the response and increase cost.
Models also have some bias for tokens at start and end of the context window, so potentially there is a reason to put important instructions in one of those places.
Further, instead of polluting the context of your main agent, you can run a subagent to do search and retrieve the important bits of information and report back to your main agent. This is what Claude Code does if you use the keyword "explore". It starts a subagent with Haiku which reads ten of thousands of tokens in seconds.
From my experience the only shortcoming of this approach right now is that it's slow, and sometimes haiku misses some details in what it reads. These will get better very soon (in one or two generations, we will likely see opus 4.5 level intelligence at haiku speeds/price). For now, if not missing a detail is important for your usecase, you can give the output from the first subagent to a second one and ask the second one to find important details the first one missed. I've found this additional step to catch most things the first search missed. You can try this for yourself with Claude Code: ask it to create a plan for your spec, and then pass the plan to a second Claude Code session and ask it to find gaps and missing files from the plan.
Basically they're just few kilobytes of text that's given as extra context to "explore" agents when looking at specific parts of the code.
We only think in conversational turns because that's what we've expected a conversation to 'look like'. But that's just a very deeply ingrained convention.
Forget that there is such a thing as 'turns' in a LLM convo for now, imagine that it's all 'one-shot'.
So you ask A, it responds A1.
But when you and B, and expect B1 - which depends on A and A1 already being in the convo history - consider that you are actually sending that again anyhow.
Behind the scenes when you think you're sending just 'B' (next prompt) you're actually sending A + A1 + B aka including the history.
A and A1 are usually 'cached' but that's not the simplest way to do it, the caching is an optimization.
Without caching the model would just process all of A + A1 + B and B1 in return just the same.
And then A + A1 + B + B1 + C and expect C1 in return.
It just so happens it will cache the state of the convo at your previous turn, and so it's optimized but the key insight is that you can send whatever context you want at any time.
If after you send A + A1 + B + B1 + C and get C1, if you want to then send A + B + C + D and expect D1 ... (basically sending the prompts with no responses) - you can totally do that. It will have to re-process all of that aka no cached state, but it will definitely do it for you.
Heck you can send Z + A + X, or A + A1 + X + Y - or whatever you want.
So in that sense - what you are really sending (if you're using the simplest form API), is sending 'a bunch of content' and 'expecting a response'. That's it. Everything is actually 'one shot' (prefill => response) and that's it. It feels conversational but structural and operational convention.
So the very simple answer to your question is: send whatever context you want. That's it.
And... and...
This results in a _very_ deep implication, which big companies may not be eager to let you see:
they are context processors
Take it for what it is.
We know that already I don’t know why have to be quiet or hint at it, in fact they have been quite explicit about it.
Or is there some other context to your statement? Anyway that’s my “take that for what you will”.
Are you talking about manually or in an automated fashion?
Would be happy to onboard you personally.
Cursor and AI coding doesn't do it. It uses agentic subtasks.
Each of the 4 responses will disagree, despite some overlap. I take the union of the 4 responses as the canonical set of files that an implementer would need to see.
This reduces the risk of missing key files, while increasing the risk of including marginally important files. An easy trade-off.
Then I paste the subset of files into GPT 5.2 Pro, and give it $TASK.
You could replace the upstream process with N codex sessions instead of N gemini chat windows. It doesn't matter.
This process can be automated with structured json outputs, but I haven't bothered yet.
It uses much inference compute. But it's better than missing key inputs and wasting time with hallucinated output.
1- Better quality output due to pruning noise, while reducing the chances of missing key context.
2- Saving time/effort by not using my brain to decide which files to include.
3- ChatGPT 5.2 Pro only allows 60k tokens, so I have no choice sometimes.
It comes with costs as you identified. It's a trade-off that I am willing to pay.
Best methods I’ve observed -progressive loading (claude skills) & symbolic search (serena mcp)
With these i'll mostly just give it questions: what are some approaches to implement x, what are the pros and cons, what libraries are available to handle x? What data would you need to create x screen, or y report? And then let it google it, or run queries on your data.
I'll have it create markdown documents or skills to persist the insights it comes back with that will be useful in the future.
LLMs are pretty good at plan/do/check/act: create a plan (maybe to run a query to see what tables you have in your database), run the query, understand the output, and then determine the next step.
Your main goal should be to enable the PDCA loop of the LLM through tools you provide.
These constraints result in token-hungry activity being confined to child scopes that are fully isolated from their parents. The only way to communicate between stack frames is by way of the arguments to call() and return(). Theoretically, recursive dispatch gives us exponential scaling of effective context size as we descend into the call graph. It also helps to isolate bad trips and potentially learn from them.
cursor-mirror skill: https://github.com/SimHacker/moollm/tree/main/skills/cursor-...
cursor-mirror
See yourself think. Introspection tools for Cursor IDE — 47 read-only commands to inspect conversations, tool calls, context assembly, and agent reasoning from Cursor's internal SQLite databases.
By Don Hopkins, Leela AI — Part of MOOLLM
The Problem
LLM agents are black boxes. You prompt, they respond, you have no idea what happened inside. Context assembly? Opaque. Tool selection? Hidden. Reasoning? Buried in thinking blocks you can't access.
Cursor stores everything in SQLite. This tool opens those databases.
The Science
"You can't think about thinking without thinking about thinking about something." — Seymour Papert, Mindstorms: Children, Computers, and Powerful Ideas (Basic Books, 1980), p. 137
Papert's insight: metacognition requires concrete artifacts. Abstract introspection is empty. You need something to inspect.
This connects to three traditions:
Constructionism (Papert, 1980) — Learning happens through building inspectable artifacts. The Logo turtle wasn't about drawing; it was about making geometry visible so children could debug their mental models. cursor-mirror makes agent behavior visible so you can debug your mental model of how Cursor works.
Society of Mind (Minsky, 1986) — Intelligence emerges from interacting agents. Minsky's "K-lines" are activation patterns that recall mental states. cursor-mirror lets you see these patterns: which tools activated, what context was assembled, how the agent reasoned.
Schema Mechanism (Drescher, 1991) — Made-Up Minds describes how agents learn causal models through Context → Action → Result schemas. cursor-mirror provides the data for schema refinement: what context was assembled, what action was taken, what result occurred.
What You Can Inspect:
Conversation Structure
Context Assembly
Tool Execution
Server Configuration
MCP Servers
Image Archaeology
Python Sister Script CLI Tool: cursor_mirror.py
cursor_mirror.py: https://github.com/SimHacker/moollm/blob/main/skills/cursor-...
Here is the design and exploration and hacking session in which I iteratively designed and developed it, using MOOLLM's Constructionist "PLAY-LEARN-LIFT" methodology:
cursor-chat-reflection.md: https://github.com/SimHacker/moollm/blob/main/examples/adven...
Look at the "Scene 19 — Context Assembly Deep Dive" section and messageRequestContext schema, and "Scene 23 — Orchestration Deep Dive" section!
PR-CURSOR-MIRROR-GENESIS.md: https://github.com/SimHacker/moollm/blob/main/designs/PR-CUR...
play-learn-lift skill: https://github.com/SimHacker/moollm/tree/main/skills/play-le...
MOOLLM Anthropic compatible extended meta skill skill: https://github.com/SimHacker/moollm/tree/main/skills/skill
Specifically you can check out ORCHESTRATION.yml and other "YAML Jazz" metadata in the directory:
ORCHESTRATION.yml: https://github.com/SimHacker/moollm/blob/main/skills/cursor-...
Currently only supports Cursor running on Mac, but I'd be happy to accept PRs for Linux and Windows support. Look at the cursor-chat-relection.md document to see how I had Cursor analyze its own directories, files, and sqlite databases and JSON schemas. Also looking for help developing mirrors and MOOLMM kernel drivers for other orchestrators like Claud Code, etc.
DATA-SCHEMAS.yml: https://github.com/SimHacker/moollm/blob/main/skills/cursor-...
dtagames•8h ago
Rules are just context, too, and all elaborate AI control systems boil down to these contexts and tool calls.
In other words, you can rig it up anyway you like. Only the context in the actual thread (or "continuation," as it used to be called) is sent to the model, which has no memory or context outside that prompt.
tcdent•6h ago
There may be a day when we retroactively edit context, but the system in it's current state is not very supportive of that.
vanviegen•3h ago
There's a little more flexibility than that. You can strip of some trailing context before appending some new context. This allows you to keep the 'long-term context' minimal, while still making good use of the cache.