I am all for using a source control system for your documents, I usually use RCS. But give AI access to your docs, no thanks. If I upload any of my docs to a public server (very rarely happens), they are compressed and encrypted to make sure only I and a few people can view them.
The author at least acknowledges the point of files is to be read by humans.
Also the article is talking specifically about public docs mean to be used by others, not ones you’re specifically trying to keep private
A) accept that my library exists, and has its uses (it's a tough world out there for canvas-focussed JS libraries that aren't Fabric.js, Konva.js or Pixi.js)
B) learn how to write code using my library in the best way possible (because the vibes ain't going away, so may as well teach the Agents how to do the work correctly)
Plus, writing the documentation[1] for a library I've been developing for over 10 years has turned into a useful brain-dumping activity to help justify all the decisions I've made along the way (such as my approach to the scene graph). I'm not going to be here forever, so might as well document as much as I can remember now.
[1] - https://scrawl-v8.rikweb.org.uk/docs/reference/index.html
GitHub Pages came out in 2008.
http://literateprogramming.com/
c.f.,
I used my software and R Markdown documents to help address such problems. In the source code, you have:
// DOC SNIPPET BEGAN: example_api_usage
/**
*/
function amazing_function( char life, long universe, string everything ) {
}
// DOC SNIPPET ENDED
In the R Markdown you write an R function to parse all snippets, then refer to snippets by name. If the snippet can't be found, building the documentation fails, and noisily breaks a CI/CD pipeline.What's nice is that you can then use this to parse C++ definitions into Markdown tables to render nicely formatted content.
The general idea is that you can have "living" documentation reference source code and break on mismatch. Whether you use knitr/pandoc or python or KeenWrite/R Markdown[1] is an implementation detail.
Our set up is:
packages/
↳ server
↳ app
↳ docs
Using mintlify for the docs, just points to the markdown files in the docs folder. And then a line in the claude.md to always check /docs for updates after adding new code.Polyrepos are workable, the way to do it is to actually version, ship, and document every subcomponent. When I mean ship, I really mean ship, as in a .deb package or python wheel with a version number, not a commit hash. AI can work with this as well, as long as it has access to the docs (which can also be AI-generated).
That means, a subcomponent can just make a needed change in the supercomponent as well, and test and the ship the subcomponent without excess ceremonies and releases.
Then you ask marketing or support to open a PR. That is usually where the markdown honeymoon ends.
* redundancy with the code: if code samples can be generated from the code, why bother duplicating them? what do they add? can they not be llm-generated later? and possibly kept somewhere out of the way (like, a website) so as not to clutter the codebase with redundancy
* if you do go for this duplication, then you are on the hook for ensuring it's always up-to-date otherwise it becomes worse than duplicate: misleading
So my preference is, when adding something to the repo, think very hard whether this information is redundant or not. Handcrafted docs, notes, comments that add more context like why was this built that way after a ton of deliberation - yes. Anything that is trivially derived from the code itself - no.
GitHub Pages serving directly from a /docs folder makes it even simpler, no separate deploy, no separate CMS, no drift. The less infrastructure between writing and publishing, the more likely docs actually get maintained.
'You must write docs. Docs must be in your repo. You must write tests. You must document your architecture. Etc. Etc.'
These were all best practices before LLMs existed and they remain so even now. I have been writing extensive documentation for all my software for something like twenty years now, whether it was for software I wrote for myself, for my tiny open source projects or for businesses. I will obviously continue to do so and it has nothing to do with:
> AI changes the game
The reason is simply that tests and documentation are useful to humans working on the codebase. They help people understand the system and maintain it over time. If these practices also benefit LLMs then that is certainly a bonus, but these practices were valuable long before LLMs existed and they remain valuable even now regardless of how AI may have changed the game.
It is also a bit funny that these considerations did not seem very common when the beneficiaries were fellow human collaborators, but are now being portrayed as very important once LLMs are involved. I'd argue that fellow humans and your future self deserved these considerations even more in the first place. Still, if LLMs are what finally motivate people to write good documentation and good tests, I suppose that is a good outcome since humans will end up benefiting from it too.
Including future you
It has the effect of finally forcing people to think about the software they're making, assuming they care about quality. If they didn't, then it's not practically different from an insecure low-code app or something copy-pasted from 15 year old StackOverflow answers.
About 95% of the work needed to make LLMs happy is just general purpose better engineering. Units tests? Integration tests? CI? API documentation? Good example? All great for humans too!
I consider this largely a good thing. It would be much worse if the changes needed for Happy LLMs were completely different than what you want for Happy Humans! Even worse would be if they were mutually exclusive.
It's a win. I'll take it.
Everything regarding AI-assisted development is basically training wheels for the young people coming into the workplace.
Many doomers are running around saying the future is grim because everything will be made for AI agents to use rather than humans. But so far everything done to push that agenda has looked more like a big de-enshittification.
Another one is Model Context Protocol, which brings forth the cutting edge (for 1970) idea of using a standard text based interface so that separate programs can interoperate through it.
If the cost of having non-user-hostile software is to let AI bros run around thinking they invented things like stdin and documentation, I'm all for it at this point.
If any AI bros are reading this here's another idea. Web pages that use a mostly static layout and a simple structure would probably be a lot easier for AI to parse. And google, it would be really beneficial to AI agents if their web searches weren't being interfered with by clickjacking sites such as Pinterest.
With AI code assistants I personally spend 90% of time/tokens on design and understanding and that means creating docs that represent the feature and the changes needed to implement it so I can really see the value growing over time to this approach. Software engineering is evolving to be less about writing the code and more about designing the system and this is supporting that trend.
In the end I don't think AI hasn't fundamentally changed the benefit/detractor equation, it is just emphasizing that docs are part of the code and making it more obvious that putting them in the code is generally pretty beneficial.
The great talk "No Vibes Allowed" put me to the far end of the other extreme - persistent long term state on disk is bad. Always force agents to rebuild, aggressively sub agent or use tools to compress context. The code should be self documenting as much as possible and structured in a way such that it's easy to grep through it. No inline docs trying to describe the structure of the tree (okay, maybe like, 3 at most).
I don't have the time to build such an elaborate testing harness as they do though. So instead I check in a markdown jungle in ROOT/docs/* . And garbage collect them aggressively. Most of these are not "look for where the code is", they are plans of varying length, ADRs, bug reports, etc. and they all can and *will" get GC'ed.
I still use persistent docs but they're very spare and often completely contractual. "Yes, I can enumerate the exact 97 cases I need to support, and we are tracking each of these in a markdown doc". That is fine IMO. Not "here let me explain what this code does". Or even ADRs - I love ADRs, but at least for my use case, I've thrown out the project and rewritten from scratch when too many of them got cluttered up... Lol.
I'm also re-implementing an open source project (with the intent of genuinely making it better as a daily user, licensed under the same license, and not just clean rooming it), which makes markdown spam less appealing to me. I kind of wish there was yet another git wrapper like jujutsu which easily layered and kept commits unified on the same branch but had multi-level purposes like this. Persistent History for some things is not needed, but git as a wrapper for everything is so convenient. Maybe I just submodule the notes....
Note: my approach isn't the best, heck, 1 month ago OpenAI wrote an article on harness engineering where they had many parallel agents working, including some which aggressively garbage collected. They garbage collected in the sense that yes, prolific docs point agents to places XYZ, but if something goes out of date, sync the docs. Again, That works if you have a huge compute basin. But for my use cases, my approach is how I combatted markdown spam.
jaredcwhite•2h ago