This seems super useful though. I'll try it out with the RR7 docs and see how well it works.
This of course only works for the “stop recommending X” part of your problem, but maybe something like the project here helps fill in with up-to-date documentation too?
Both Cursor and Roo also support URL context additions, which downloads the page and converts it to a machine readable format to include in context. I throw documentation links into my tasks with that all the time, which works out because I know that I am going to be sanity checking generated code against documentation anyway.
Which is fine, but is there a description of the format distinct from this particular use? (I'm playing around with these same knowledge representation and compression ideas but for a different domain, so I'm curious about the ideas behind this format)
Edit: After a longer look, this needs more polish. In addition to key question raised by someone else about quality, there are signs of rushed work here. For example the critical llm_min_guideline.md file, which tells the LLM how to interpret the compressed version, was lazily copy-pasted from an LLM response without even removing the LLM's commentary:
"You are absolutely right! My apologies. I was focused on refining the detail of each section and overlooked that key change in your pipeline: the Glossary (G) section is no longer part of the final file..."
Doesn't exactly instill confidence.
Really nice idea. I hope you keep going with this as it would be a very useful utility.
> $ llm-min --help
Usage: llm-min [OPTIONS]
Generates LLM context by scraping and summarizing documentation for Python libraries.
For this to "work" you need to have a metric that shows that AIs perform as well, or nearly as well, as with the uncompressed documentation on a wide range of tasks.
You can use success rate % over N runs for a set of problems, which is something you can compare to other systems. A separate model does the evaluation. There are existing frameworks like DeepEval that facilitate this.
Having data is how we learn and build intuition. If your experiments showed that modern LLMs were able to succeed more often when given the llm-min file, then that’s an interesting result even if all that was measured was “did the LLM do the task”.
Such a result would raise a lot of interesting questions and ideas, like about the possibility of SKF increasing the model’s ability to apply new information.
The job of any context retrieval system is to retrieve the relevant info for the task so the LLM doesn't hallucinate. Maybe build a benchmark based on less-known external libraries with test cases that can check the output is correct (or with a mocking layer to know that the LLM-generated code calls roughly the correct functions).
Cherry picking a tiny example, this wouldn't capture the fact that cloudflare durable objects can only have one alarm at a time and each set overwrites the old one. The model will happily architect something with a single object, expecting to be able to set a bunch of alarms on it. Maybe I'm wrong and this tool would document it correctly into a description. But this is just a small example.
For much of a framework or library, maybe this works. But I feel like (in order for this to be most effective) the proposed spec possibly needs an update to include little more context.
I hope this matures and works well. And there's nothing stopping me from filling in gaps with additional docs, so I'll be giving it a shot.
As it stands, there is nothing demonstrating that this lossy compression doesn't destroy essential information that an LLM would need.
I also have a gut feeling that the average LLM will actually have more trouble with the dense format + the instructions to decode it than a huge human-readable file. Remember, LLMs are trained on internet content, which contains terabytes of textual technical documentation but 0 bytes of this ad-hoc format.
I am happy to be proven wrong on both points (LLMs are also very unpredictable!), but the burden of proof for an extravagant scheme like this lies solely on the author.
Again, I'm not saying the solution doesn't work well (my intuition on LLMs has been wrong enough times), but it would be really helpful/assuring to see some hard data.
Edit: not quite.
But what I find best - cloning doc sites directly into my repos in a root context folder. I have bash scripts for managing those, and I instruct Claude how to use them. Context7 I dont like for the same reasons I don't hook up any MCP to Claude Code.
it aint much but its simple and i control it
This has done wonders for improving our results when working with TanStack Start or shadcn/ui or whatever.
I guess there's pieces of this that would be helpful to us, but there's too much setup work for me to mess with it right now, I don't feel like generating a Gemini api key, installing puppeteer, etc.
I already have all the docs pulled down, but reducing the number of tokens used for my LLM to pull up the doc files I'm referencing is interesting.
Is there a command line tool anyone has had luck with that just trims down a .md file but still leaves it in a state that the LLM can understand it?
It's not obvious to me that this is a good idea. LLMs are trained on human-readable text.
The author notes that non-reasoning LLMs struggle with these SKFs. Maybe that's a hint that human readable summaries would perform better? Just a guess.
Or perhaps a vector store?
[1]: Introducing KBLaM: Bringing plug-and-play external knowledge to LLMs | https://www.microsoft.com/en-us/research/blog/introducing-kb...
It's a vibe-coded project and it definitely feels like a vibe-coded fun hack idea (which there's nothing wrong with having a little nerdfun!).
Some more critical thoughts, in case you are motivated to push this forward:
Biggest critical point: I think there may also be a misunderstanding of llms.txt vs. llms-full.txt proposals, and a conflation of the former with the latter. A "minified" version is likely redundant since the intention of llms.txt (as opposed to llms-full.txt) is:
> `llms.txt` is an index file containing links with brief descriptions of the content. An LLM or agent must follow these links to access detailed information.
> `llms-full.txt` includes all the detailed content directly in a single file, eliminating the need for additional navigation.
> A key consideration when using `llms-full.txt` is its size. For extensive documentation, this file may become too large to fit into an LLM's context window. (https://langchain-ai.github.io/langgraph/llms-txt-overview)
There's also lots of folks working together trying to figure this stuff out. Reach out to them and join their communities!
And if you're up for some more feedback:
- Curated knowledge < stochastic summarization -- One-shot minification for core domain knowledge of a product seems likely less effective and inherently risky compared to a domain expert human "minifying" it (and most professional/technical writing already does a kind of minification via progressive disclosure with introductions, quick start guides, navigation, etc.). If you want to convince people your way is the way, write evals that show it's better than a human expert at achieving agentic outcomes more efficiently and effectively (more reliably leads to successful desired outcomes).
- You introduce a new standard (which feels very vibe-coded...), and it's very specific, rigidly structured, verbose, nested, introduces additional semantic nodes where none existed before (e.g. instead of using the actual names of things, it creates new names and nests the old underneath), and it's generally complex and nested. This could lead to fairly long compute run-times and misses for a back-and-forth trying to understand the larger semantic surface area.
- confusing file types -- .txt AND .md files? Isn't llms.txt already markdown? And in your new llms-min.txt file, the text is actually more like code (highly structured data) and its own DSL.
- unintended (?) prompt injections - e.g., "You are an expert AI Software Engineer." in your examples (https://github.com/marv1nnnnn/llm-min.txt/blob/main/sample/c...)
- Your token reduction just counts the tokens, it's not actually informative about token (and compute) costs for an LLM actually try to use the files. If an LLM is going to have to reason extensively about the file to grok it, token count of the static file is largely meaningless.
- Production-ready agents that would need to have a shorter version of texts are likely already equipped with chunking, text search, semantic grouping, and other tools that they can call when encountering a large corpus without flooding their context window with tokens. These will typically be setup by an agent developer to deliver maximally efficient & effective tokens for their task and capabilities.
Anyway, none of the above was vibe-commented. But since you are vibe-coding this, maybe throw it all at your RooCode agent to develop a plan to address this critical feedback? :D Happy vibing!
> If you've ever used an AI coding assistant (like GitHub Copilot, Cursor, or others powered by Large Language Models - LLMs), you've likely encountered situations where they don't know about the latest updates to programming libraries. This knowledge gap exists because AI models have a "knowledge cutoff" – a point beyond which they haven't learned new information.
This isn’t quite right. LLMs don’t memorize APIs because they aren’t trained to do so in the first place. LLMs are intuitive algorithms; if you want them to (reliably…) follow a finite set of formal rules, then you’re gonna need RAG either way.
k__•5h ago
Seems like it could be a nice addition to Aider to supplement repo maps.