The most immediate benefits for me are easier inspection and searching of the code in any text editor, and infinitely nicer version control. But it does also let you run and import the Notebook as if it was a Python script!
While writing my thesis I have also been experimenting with a Spyder-like workflow in VS Code, where you put in "# %%" to separate code blocks and get to run them in an IPython console. It had its perks, like the better Intellisense, and also resulted in this mix of interactivity and runnable file. Not as good on the markup front though.
I feel like it needs its own IDE, because now apart from the coding abstractions you also have named snippets.
It happens in some forms of Bank Python, but there's not much of it going on in the public/open-source world. I think because the advantages for a lone developer are small, and it's hard to maintain for an internet-based project since globally distributed databases are still expensive, bad, or both.
Maybe a tool like the one presented here could work as a language server proxy to the underlying language's server. The presence of literate text alone doesn't seem to be the main issue, it's getting the code portions parsed, checked, and annotated with references that matters.
Obviously, the type checking will be a bit more limited for code snippets you haven't finished. But especially for image based environments, it should have everything that you have in the image just fine.
CWEB, which is the one that Knuth prefers, even supports step debugging. Has supported it for decades, at this point.
https://github.com/WillAdams/gcodepreview/blob/main/literati...
which allows me to have an ordinary .tex file:
https://github.com/WillAdams/gcodepreview/blob/main/gcodepre...
which outputs multiple .py and .scad files and generates a .pdf with nice listings-based code blocks, ToC, index, hyperlinks, &c.:
https://github.com/WillAdams/gcodepreview/blob/main/gcodepre...
The notable downsides are that the .sty and .tex files have to be customized for the filenames which one can output, and I haven't been able to get auto-line numbering working between code blocks, so one has to manually manage the counters.
cf leo editor for literate programming in python [0]
Yes, markdown has code blocks, and notebooks have embedded code in documentation since Mathematica in the 1980's. It is possible to get IDE support in such blocks.
But for literate programming, weaving/tangling sources is needed to escape the file structure, particularly when the build system imposes its own logic, and sometimes one needs to navigate into the code. Leo shows how complicated the semantics of weaving can get.
Eclipse as an IDE was great because their editor component made it easy to manage the trick of one editor for many sources, and their markers provided landmarks for cross-source navigation and summaries.
Late 80's, very late ... but the concept of "notebooks" predates Mathematica by at least a decade (it was very common to embed structure in source code files with markup).
It’s interesting that using LLMs is making very explicit that “someone” needs to read the code and understand it. So having good comments and making code readable is great both for AI and humans
1: “Writing documentation for AI: best practices” https://news.ycombinator.com/item?id=44311217
I avoid code comments where I can because English is way less precise than code, it's an extra chore to keep the comments and code in sync, and when the comments and code inevitably get out of sync it's confusing which one is the source of truth. Does literate programming sidestep this somehow? Or have benefits that outweigh this?
I think where it shines, is where it helps you break the code up, without having to break it up in a way that makes sense for the computer. Show an outline, but then drill into a section. The overall function can then be kept as a single unit, and you can sort of punt on sub sections. I tried this just recently in https://taeric.github.io/many_sums.html. I don't know that I succeeded, necessarily. Indeed, I think I probably should have broken things into more sections. That said, I did find that this helped me write the code more than I expected it to. (I also was very surprised at how effective the goto style of thinking was... Much to my chagrin.)
I will have to look again at some of the code I've read this way.
To directly answer the question of if it helped keep the documentation in sync, as it were, that is tough. I think it helps keep the code in a section directly related to the documentation for that section. All too often, the majority of code around something is not related to what you were wanting to do. Even the general use of common code constructs gets in the way of reading what you were doing. Literate programming seems the best way I have seen to give the narrative the ability to say "here is the outline necessary for a function" and then "this particular code is to do ..." Obviously, though, it is no panacea.
Literate programming seems fine for heavily algorithmic stuff when there's a lot of explaining to do compared to the amount of code and the code is linear, but I was more thinking about how it works for common web apps where it's lots of mundane code that criss-crosses between files.
- gcodepreview.py (gcpy) --- the Python functions and variables
- pygcodepreview.scad (pyscad) --- the Python functions wrapped in OpenSCAD
- gcodepreview.scad (gcpscad) --- OpenSCAD modules and variables
as explained in: https://github.com/WillAdams/gcodepreview/blob/main/gcodepre... and it worked quite well (far better than the tiled set of three text editor windows which I was using at first) and I find the ability to sequence the code for the separate files in a single master file very helpful.
I say oddly, as I don't think I've seen it done for common web apps. I suspect that is largely because frameworks have not been a stable foundation to build on in a long time?
I can't help but think the old templates of old were a hint in how it would have worked fine? Have a section of the literate code that outlines the general template of a file, and where the old "your code goes here" comments used to denote where you add your logic, is instead another section that you can discuss on its own. (Anyone else remember those templates? Were common in app builders, if I recall correctly.)
Usually the problem with comments is that there is too less of it.
I've worked in a few code bases where many of the comments could be removed by using better function names, better variables names, and breaking complex conditionals into named subexpressions/variables.
And there was a fair chance comments were misleading or noise e.g. `/* send the record for team A */ teamB.send(...`, `/* if logged in and on home page */ if (!auth.user && router.name === 'home') ...`, `/* connect to database */ db.connect()`. I'd much rather comments were used as a last resort as they're imprecise, can be bandaids for code that's hard to read, and they easily get out of sync with the code because they're not executed/tested.
A block of comments to explain high-level details of complex/important code, or comments to explain the why or gotchas behind non-obvious code are useful though.
But even then my experience doesn’t match yours. So you have some code. Who decided it would be that way? Do you have a picture of how it should look? Can you share a link to where you got this information? What problem led you do to this non-obvious thing?
This had nothing to do with literate programming. I could as well ask "how often is the English in comments repeating what's already written in code? "
Yeah, it can look a bit repetitive if the code is already clear, but the context of why a thing is being done is still valuable. In the modern era with LLM tools, I'm sure it could be even more powerful.
Is that because of literate programming, or is that because practicing literate programming made you focus more on writing high quality code and docs?
But the specifics of the flow aside, it's the mindset difference that makes it all feel special. The docs are the primary artifact. The code is secondary.
In an era of Copilot-style inline suggestions, taking the time to write a lengthy description effectively feeds the prompt to get a better output.
I can definitely see such a practice improving LLM output.
Meanwhile, there are programmers that think comments are a "code smell".
"Literate programming (LP) offers 2 classical operations:
Tangle: Extract the source code blocks and generate real working code files for further compilation or execution, eventually outside of Emacs.
Weave: Export the whole Org file as literate, human-readable documentation (generally in HTML or LaTeX)."
[1] https://org-babel.readthedocs.io/en/latest/- https://xenodium.com/ob-swiftui-updates
- https://github.com/xenodium/ob-dall-e-shell
I don't mean that to sound dismissive. This might be the most popular tool out there for all I know, and so well done that it hasn't needed any updates any ages.
All of which shows why there are so many Literate Programming solutions/implementations --- it's a fairly simple problem (though I'm a lame programmer, so had to get help on tex.stackexchange) and it's easy to roll a solution to scratch a particular itch.
One often overlooked cute aspect of lp is how a digression on code you tried, but chose not to incorporate, is first class with the same highlighting etc as active code. It isn’t relegated to a monochrome, non syntax highlighted and awkwardly indented block comment. I find this very appropriate, and it encourages documenting “tried but failed” experiments , which can be incredibly useful.
Edit: another really cool benefit of lp: the “examples” chapter = tests. You can tangle the examples into a test script and run them in CI. Very satisfying.
I would not say it is a good way. The true thing is that code cannot be self-documented, thus documentation is necessary. But literate documentation is not the right way to do it.
First, it is too crafty. The good documentation should be formal and have a definite standard structure that is repeated across all similar projects. It also should have an encyclopedia-like form of short standalone pieces interlinked into a bigger whole so you can start anywhere instead of reading it from start to end.
Second, it is too programmatic. It has real code. But real code is not good to describe what is important. Try porting METAFONT to something else. METAFONT code is fully documented. In Pascal. And you want to draw it with JavaScript. It would be way more helpful to describe these algorithms without tying them to Pascal or any other specific notation. And then, once they are described in this form, add a document that maps them to a Pascal implementation.
That comment about beginners is a nod to the sibling comment explaining how it is useful for a beginner. I've never found it useful myself, but I can see the value.
"TeX: The Program" is a joy to read.
You could theoretically write a literate program that is nothing but code, if the code is so readable that it doesn’t need explaining. The distinction is that it is “human first” over “computer first”.
Referring to specific literate programs would make this comment easier to believe. Even my very first literate programs avoided this trait, so I really cannot relate.
It sounds like you are describing trivial in-line comments instead of chunks of programs interspersed with explanations.
When I write Literate Programs, it's mostly for future me so that I can remember why a particular approach was taken, or what the significance of the two slightly differently named variable is and why they are not interchangeable. If it's for others to use, then user documentation is a specific section of the program code, and possibly a totally separate document (only written after the fact when the UI and so forth is stable enough that things won't change).
FWIW, I am working to reimplement parts of METAFONT in my current project (need the curves, and want an implementation which will also allow me to write out a .mp file) and I'm finding _METAFONT: The Program_ very helpful.
I have a reimplementation of parts of METAFONT as a hobby project. In fact, it is no longer METAFONT; it is a language based on METAFONT. However, I used the same Hobby's algorithm to generate curves. It understands pens, paths and renders it to an OpenGL texture. The results are not compatible with Knuth's METAFONT, as I use floating-point numbers instead of the fixed-point arithmetic. It is still under development and needs some cleaning, but if what you are developing is for personal use or is free software compatible with GNU Affero GPL, perhaps parts of the code, or even the whole, could be of use.
And nowadays, maybe llms could check for inconsistencies between the docs & code ?
My script doesn't try to shuffle code around, which has the advantage that if you are familiar with the general structure of the code, the documentation follows the same structure. (This may not be the best order to explain the theory of the program, as is noted in several introductions to literate programming).
Instead of code blocks, my little shell script has a per-line approach, where each code line is preceded by the name of the file to which it is extracted. This approach allows me to name a variant immediately after the filename, so that I can code alternative lines, and decide at the time of extraction which sets of lines to use. This is also useful for extracting multiple very similar files from a single markdown source. This use of variants has been very effective in supporting alternative implementations, since I can quickly switch between them by the list of variants I give the tool to extract.
tony_cannistra•7mo ago
[1]: https://en.wikipedia.org/wiki/Noweb
onair4you•7mo ago
dunham•7mo ago
I don't remember why I selected nuweb, other than it worked with any language, but it looks like it was inspired by noweb. I had learned about literate programming from studying TeX.
zimpenfish•7mo ago
(The username being `partingr` suggests it was some time late 92 to mid 95 whilst I was at cs.man.ac.uk)
https://github.com/nrnrnr/noweb/tree/master/contrib/partingr