Would a few decades help in universally having such a translator in all the tools?
With things like treesitter and the like, I sometimes daydream about what an efficient and effective HCI for an AST or IR would look like.
Things like f#s ordered compilation often make code reviews more simple for me, but that’s because a piece of the intermediate form (dependency order) is exposed to me as a first class item. I find it much more simple to reason about compared to small changes in code with more lax ordering requirements, where I often find myself jumping up and down and back and forth in a diff and all the related interfaces and abstract classes and implementations to understand what effect the delta is having on the program as a whole.
If you want everyone to see their own preference of format, either write a script or get AI to format it for you.
I heard this, many years ago, when we used Perforce. The Perforce consultant that we dealt with, told us this, as an example of triggers. Back then, I was told that Google was a big Perforce shop (maybe just a part of Google. I dunno).
I have heard that this was one of the goals of developing IDLs. I think the vision was, that you could have a dozen different programmers, working in multiple languages (for example, C for the drivers, Haskell for the engine, and Lua for the UI). They would be converted to a common IDL, when submitted to configuration management, and then extracted from that, when the user looks at it.
I can't see that working, but a lot of stuff that I used to think was crazy, has happened, so, who knows?
I was on an internal tools team doing distinctly unsexy LAMP-stack work, but all the documentation I ever saw talked about perforce/p4.
I had never heard of DIANA but I love old ideas being new again. (Plus you made me laugh)
I guess lisp still has whitespace? That seems like the only meaningful way it isn't already just what the post is describing.
(defun check-password-against-hash (password hash)
(handler-case
(bcrypt:password= password hash)
(error () nil)))
There's already multiple choices on formatting (and naming, and other things) just from this sample.In theory a system could be made where this level of code isn't what's actually stored and is just a reverse pretty-print-with-my-preferences version of the code, as the post mentions. SBCL compiles my function when I enter it, I can ask SBCL to describe it back to me:
* (describe #'check-password-against-hash)
#<FUNCTION CHECK-PASSWORD-AGAINST-HASH>
[compiled function]
Lambda-list: (PASSWORD HASH)
Derived type: (FUNCTION (T T) *)
Source form:
(LAMBDA (PASSWORD HASH) (BLOCK CHECK-PASSWORD-AGAINST-HASH (HANDLER-CASE (CL-BCRYPT:PASSWORD= PASSWORD HASH) (ERROR NIL NIL))))
I can also ask SBCL to show me the disassembly, perhaps again in theory a system could be made where you can get and edit text at that level of abstraction before putting it back in. * (disassemble #'check-password-against-hash)
; disassembly for CHECK-PASSWORD-AGAINST-HASH
; Size: 308 bytes. Origin: #xB8018AA278 ; CHECK-PASSWORD-AGAINST-HASH
; 278: 498B4510 MOV RAX, [R13+16] ; thread.binding-stack-pointer
; 27C: 488945F8 MOV [RBP-8], RAX
; 280: 488965D8 MOV [RBP-40], RSP
; 284: 488D45B0 LEA RAX, [RBP-80]
; 288: 4D8B7520 MOV R14, [R13+32] ; thread.current-unwind-protect-block
; 28C: 4C8930 MOV [RAX], R14
; ... and so on ....
(SBCL does actually let you modify the compiled code directly if you felt the urge to do such a thing. You just get a pointer to the given origin address and offset and write away.)But just going back to the Lisp source form, it's close enough that you could recover the original and format it a few different ways depending on different preferences. e.g. someone might prefer the first expression given to handler-case to be on the same line instead of a new line like I did. But to such a person, is that preference universal, or does it depend on the specific expressions involved? There are other not strictly formatting preferences at play here too, like the use of "cl-bcrypt" vs "bcrypt" as package name, or one could arrange to have no explicit package name at all. My own preferences on both matters are context-sensitive. The closest universal preference I have around this general topic is that I really hate enforced format tools even if they bent to my specific desires 100% of the time.
I'd say the closest modern renditions of what the post is talking about are expressed by node editors. Unreal's Blueprints or Blender's shader editor are two examples, ETL tools are another. But people tend to work at the node level (and may have formatting arguments about the node layout) rather than a pretty-printed text representation of the same data. I think in the ETL world it's perhaps more common to go under the hood a little and edit some text representation, which may be an XML file (and XML can be pretty-printed for many different preferences) or a series of SQL statements or something CSV or INI like... whether or not that text is a 'canonical' representation or a projection would depend on the tool.
That's true, but there is a very big difference between S-expressions stored as text and other programming languages stored as text because there is a standard representation of S-expressions as text, and Common Lisp provides functions that implement that standard in both directions (READ and PRINT) as part of its standard library. Furthermore, the standard ensures READ-PRINT equivalency, i.e. if you READ the result of PRINTing an object the result is an equivalent object. So there is a one-to-one mapping (modulo copying) between the text form and the internal representation. And, most importantly, the semantics of the language are defined on the internal representation and not the textual form. So if you wanted to store S-expressions in, say, a relational database rather than a text file, that would be an elementary exercise. This is why many CL implementations provide alternative serializations that can be rendered and parsed more efficiently than the standard one, which is designed to be human-readable.
This is in very stark contrast to nearly every other programming language, where the semantics are defined directly on the textual form. The language standard typically doesn't even require that an AST exist, let alone define a canonical form for it. Parsers for other languages are typically embedded deep inside compilers, and not provided as part of the standard library. Every one is bespoke, and they are often byzantine. There are no standard operations for manipulating an AST. If you want to write code that generates code, the output must be text, and the only way to run that code is to parse and compile it using the bespoke parser that is an opaque part of the language compiler. (Note that Python is a notable exception.)
It doesn’t get much less formatted than Minified JavaScript, except maybe Perl or Brainfuck.
But I'll also mention that this pretty much already exists. You can have whitespace options for git. I also imagine there's some setup using hooks that uses one formatter locally, and another for remote.
Also, the common IR already exists - it's just the AST. It was "solved" back in the day when people were throwing whatever they could to the wall to see what sticks since it was all so new. With the benfit of hindsight, I think we can say that it's not that good of an idea.
The plain text encoding itself exists in a process of incremental, path-dependent development from Morse Code signals to Unicode resulting in a "Gigantic Lookup Table" (GLUT, my coining) approach to symbolic comprehension. The assumption is useful - lots of features can "just work" by knowing that a particular bit pattern is always a particular symbol.
If we push up the abstraction level, we get a different set of symbols that are better suited to the app, but not equivalent GLUT tooling. Instead we usually get parsing of plain text as a transport. For example, CSV parsing. It is sloppy; it is also good enough.
Edit: XML is also a key example. It goes out of its way to respect the text transport approach. There are dedicated XML editors. But people want to edit it as plain text and they can't quite get there because funny-business with character encodings gets in the way, adding a bunch of ampersands and semicolons onto the symbols they want to edit. Thus we have ended up with "the CSV of hypertext documents", Markdown.
func main()
{
fmt.Println("HELLOWORLD")
}
is not just non-standard formatting, but illegal Go syntax. Similarly, extra parentheses around if clauses are not allowed.However 'if (x) == (1) {}' is totally fine with the formatter. As is an assignment of '(x) = (y)'.
It's actively annoying too because like, extra parenthesis often have important meaning.
For example, consider the following code:
if (x.isFoo() || x.isBar()) /* && x.isBaz() */ { /* code */ }
In that case, the code is obviously temporarily commented out, but go's formatting will make it so that if you comment it out like that, fmt, and then uncomment it and forget to re-add the parens, you get shot in the foot.I've hit that far more times than it's uhh... I dunno, I guess removed parenthesis I didn't want? I don't write them if I don't want them.
i wonder how many default formatting decisions are made this way (including go fmt, etc)
Something between "everything fits on one short line" and "every argument gets its own line" would be nice too. Spreading a function definition or call across ten lines when it would fit on two or three doesn't feel like an automatic win.
https://naildrivin5.com/blog/2013/05/17/source-code-typograp...
> Some of us even align other parts of our code, such repeated inline comments
> Now, the arguments block forms a table of three columns. The modifiers make up the first column, the data types are aligned in the second column, and the names are in the third column
These feel like pretty trivial routines that can be encompassed by code formatting.
We can contrive more extreme examples, like the for loop, but super custom formatting ("typesetting") like that has always made me feel awkward, feels like it givesicemse for people to use all manners of arbitrary formatting. The author has some intent, but when you run into an inconsistent code based with lots of things going on, the variance doesn't feel informative or helpful: it sucks and it's a drain.
What's stored is perhaps more minimal, some kind of reference encoding, maybe prettier-ifies for js. The meat of this article to me is that it shouldn't matter: the IDE should let you view and edit as you like:
> Everyone had their own pretty-printing settings for viewing it however they wanted.
Now explain a declaration like "char *argv[]"...
> We’ve also re-set the data type such that there is no space between char and * - the data type of both of these variables is “pointer to char”, so it makes more sense to put the space before the argument name, not in the middle the data type’s name (update: it should be pointed out that this only makes sense for a single declaration. A construct like char* a, b will create a pointer to char, a, and a regular char, b).
Ah, yes, the delusional C++ formatting style. At least it's nice that the update provides the explanation why it should be avoided.
You also don't think about dollars differently than other units, just because the sign goes before the number.
I wouldn't draw any conclusions about autoformatters from clang-format.
(That said, it must be possible to make a more sophisticated formatter for the source code too.)
What’s the point of such an heavy obfuscation of the intend, really? Let’s take the first example.
char *
strcpy(to, from)
register char *to;
register const char *from;
{
char *save = to;
for (; (*to = *from) != 0; ++from, ++to);
return(save);
}
If we are fine with the "lengthy" register, why not use character in full word? Or if we want something shorter sign would be actually semantically more on point in general.What with the star to design a pointer? Why not sign-pointer? Or pin for short if we dare to use a pretty straightforward metaphor, so sign-pin. Ah yes by the way, using "dot" (.) or "dash, greater than" (->) is such a typographical non-sense.
And as a side note *char brings nothing in readability compared to sign-pin-pin. Remember that most people read words or even word sequences as a whole. And let’s compare **char to something like sign-pin-back-5.
What with
strcpy? Do we want to play code-obfuscation to look smart being able to decode this pile of letter sequence? What’s wrong with string·copy* or even stringcopy (compare photocopy)? Or even simply copy? If we want to avoid some redundant identifier without relying on overriding through argument types, English is rich in synonyms. For example duplicate, replicate, reproduce.Various parentheses could be just as well optional to ease code browsing if proper typography is already on place, and English already provide many adverb/preposition that could replace/complement them into a linguistically more usual counterparts.
Speaking about prepositions, using from and to as identifiers for things which would be far more aptly described with nouns is really such a confusing choice. What’s wrong with origin/source and destination/target? It’s also a bit counterproductive to put the identifier, which is the main point of interest, at the very end of it’s declaration statement.
Equal for assignment is just really an artifact of more relevant symbol like ← or ≔ because most keyboard layouts stem from disastrous design. But using an more adequate symbol is really pushing for unnecessary obscured notation.
Mandatory semicolon to end a statement is obviously also a typographical nonsense.
If a parameter is to be left blank in for, we would obviously be better served with a separate control-flow construction rather than any way to highlight it’s not filled in that employ.
So packing it all:
duplicate as function ⟨
requiring (
origin as sign-pin-register,
destination as sign-pin-register
)
making {
save as sign-pin
save assigned origin
destination-pin assigned origin-pin until ( zeroized,
whilst [
origin-increment,
destination-increment
wrought ]
done )
return save
made }
built ⟩
Given that in that case the parentheses and comas are purely ornamental, the compiler could just ignore them and would have enough information with something like duplicate as function
requiring
origin as sign-pin-register
destination as sign-pin-register
making
save as sign-pin
save assigned origin
destination-pin assigned origin-pin until zeroized
whilst
origin-increment
destination-increment
wrought
done
return save
made
built
Or even duplicate as function requiring origin as sign-pin-register destination as sign-pin-register making save as sign-pin save assigned origin destination-pin assigned origin-pin until zeroized whilst origin-increment destination-increment wrought done return save made built
Status quo fallacy alert. Arguments are not forever mired in a current state of affairs. People can learn and can build tools to help them do better.
This could change quickly; e.g. if Claude or GitHub or (Your Team) decide to prioritize how source code looks.
But formatting still doesn't matter. Outside of whitespace-dependent languages, formatting is a subjective thing -- it's a people concern, not a computer concern. I can store my JavaScript as AST if I want to.
<span>foo</span>
vs: <span>
foo
</span>
All of this seems doable, I just think for the most part we don't care very much about our preferences, it has very little impact on readability. Its definitely doable however we could view the code however we most wanted it and have it stored in a different formatting. Might not be 100% round trip stable but it probably doesn't matter.
There is always better where the defaults can be overridden and formatting forced and we only format new and changed lines to reduce potential instability but again go fmt doesn't really suffer from this so its possible to make things pretty reliable. Its simple really, there is a default formatting and the code is stored that way and we can then have our view of choice reformat the code as we want it, when its stored its stored in the default.
Leave code format up to the primary owner of the file. It is pretty rare that code has more than one person that does 95% of the edits on a file so let them own the formatting. In the rare case where there are shared files with shared edits then it is ok to mandate some sort of enforced format but those are so rare that it generally isn't worth discussing. The proposed approach here ignores all the messy non-standard stuff that happens because of the margins or the rules that are very hard to build in when codifying personal coding style.
Let me have my messy desk and I'll let you have yours.
Its such a cool idea, though I haven't spent much time using it in anger, so its hard to say if its a useful idea.
I'm just waiting for a breakthrough project to show that it's ready for wider adoption. Leaving text-based tooling is a big ask.
The principles behind Unison, for those who haven't read them yet: https://www.unison-lang.org/docs/the-big-idea/#richer-codeba...
> Each Unison definition is identified by a hash of its syntax tree.
The project is dead enough that they no longer own the TLD for the company. As far as I know, the only remnants of the project are youtube recordings of demos held at conferences.
Some languages (java) really need the extra horizontal space if you can afford it and aren’t too hard to read when softwrapped.
Log statements however I think have an effectively unbounded length. Nothing I hate more than a stupid linter turning a sprinkling of logs into 7 line monsters. cargo fmt is especially bad about this. It’s so bad.
Sent from my 49” G9 Ultrawide.
What I actually want from a linter is “120, unless the trailing bits aren’t interesting in which case 140+ is fine”. The ideal rule isn’t hard and fast! It’s not pure science. There’s an art to it.
All that said, I'm interested with this 132 number, where does it come from?
Interesting here perhaps is that even back then it was recognized, that for different situations, different display modes were of advantage.
I'd forgotten that; now that waa a fugly font. I don't think anyone ever used it (aside from the "Setup" banner on the settings screen)
I think the low pixel count was rather mitigated by the persistence of phospher though - there's reproductions of the fonts that had to take this into account; see the stuff about font stretching here: https://vt100.net/dec/vt220/glyphs
Really suites each language imo Although I could probably get away with 80, habit to use tailwind classes can get messy compared to 120
16:9 is rarely what you want for anything that is mainly text.
But someone will always have to either scroll horizontally or wrap the text. I’m speaking as someone who often views code on my phone, with a ~40 characters wide screen.
In typography, it’s well accepted that an average of ~66 chars per line increases readability of bulk text, with the theory being that short lines require you to mentally «jump» to the beginning of the next line frequently which interrupts flow, but long lines make it harder to mentally keep track of where you are in each line. There is however a difference between newspapers and books, since shorter ~40-char columns allows rapid skimming by moving your eyes down a column instead of zigzagging through the text.
But I don’t think these numbers translate directly to code, which is usually written with most lines indented (on the left) and most lines shorter than the maximum (few statements are so long). Depending on language, I could easily imagine a line length of 100 leading to an average of ~66 chars per line.
In my experience, with programming you rarely have lines of 140 printable characters. A lot of it is indentation. So it’s probably rarely a problem to find your way back on the next line.
For C/C++ headers I absolutely despise verbose doxygen bullshit commented a spreading relatively straightforward functions across 10 lines of comments and args.
I want to be able to quickly skim function names and then read arguments only if deemed relevant. I don’t want to read every single word.
I like splitting long text as in log statements into appropriate source lines, just like you would a Markdown paragraph. As in:
logger.info(
"I like splitting long text as in log statements " +
"into ” + suitablelAdjective + " source lines, " +
"just like you would a Markdown paragraph. " +
"As in: " + quine);
I agree that many formatters are bad about this, like introducing an indent for all but the first content line, or putting the concatenation operator in the front instead of the back, thereby also causing non-uniform alinkemt of the text content.With some expressions, like lookup tables or bit strings, hand wrapping and careful white space use is the difference between “understandable and intuitive” and “completely meaningless”. In JS world, `// prettier-ignore` above such an expression preserves it but ideally there’s a more universal way to express this.
Boy that was fast.
But that's the core of this article, too; since then it's normalized to store the plain text source code in git and share it, but it mentions a code and formatting agnostic storage format, where it's down to people's editors (and diff tools, etc) to render the code. It's not actually unusual, since things like images are also unreadable if you look at their source code, but tools like Github will render them in a human digestable format.
could. Yesterday notepad (win 10) just plainly refused.
About once every other project, some portion of the source benefits from source code being arranged in a tabular format. Long lines which are juxtaposed help make dissimilar values stand out. The following table is not unlike code I have written:
setup_spi(&adc, mode=SPI_01, rate=15, cs_control=CS_MUXED, cs=0x01);
setup_spi(&eeprom, mode=SPI_10, rate=13, cs_control=CS_MUXED, cs=0x02);
setup_spi(&mram, mode=SPI_10, rate=50, cs_control=CS_DIRECT, cs=0x08);
Even if we add 4-5 more operational parameters, I find this arrangement much more readable than the short-line equivalent: setup_spi(&adc,
mode=SPI_01,
rate=15,
cs_control=CS_MUXED,
cs=0x01);
setup_spi(&eeprom,
mode=SPI_10,
rate=13,
cs_control=CS_MUXED,
cs=0x02);
setup_spi(&mram,
mode=SPI_10,
rate=50,
cs_control=CS_DIRECT,
cs=0x08);
Or worse, the formatter may keep the long lines but normalize the spaces, ruining the tabular alignment: setup_spi(&adc, mode=SPI_01, rate=15, cs_control=CS_MUXED, cs=0x01);
setup_spi(&som_eeprom, mode=SPI_10, rate=13, cs_control=CS_MUXED, cs=0x02);
setup_spi(&mram, mode=SPI_10, rate=50, cs_control=CS_DIRECT, cs=0x08);
Sometimes a neat, human-maintained block of 200 character lines brings order to chaos, even if you have to scroll a little.I've often wished that formatters had some threshold for similarity between adjacent lines. If some X% of the characters on the line match the character right above, then it might be tabular and it could do something to maintain the tabular layout.
Bonus points for it's able to do something like diff the adjacent lines to detect table-like layouts and figure out if something nudged a field or two out of alignment and then insert spaces to fix the table layout.
And sometimes, if the code doesn't look good after automatic formatting, the code itself needs to be fixed. I'm specifically thinking about e.g. long or nested ternary statements; as soon as the auto formatter spreads it over multiple lines, you should probably refactor it.
Thus 80 or perhaps 120 char line lengths!
Especially 80 characters is a ridiculously low limit that encourages people to name their variables and functions some abbreviated shit like mbstowcs instead of something more descriptive.
80 is probably too low these days but it's nice for git commit header length at least.
What a terrible attitude to have when working with other people.
"Oh, I'm the only one who writes Python? Fix your setup. why should I, who know python, not write it for your sake?"
"Oh, I'm the only one who speaks German? Fix your setup. Why should I, who know German, not speak it for your sake?"
How about doing it because your colleagues, who you presumably like collaborating with to reach a goal, asks you to?
>How about doing it because your colleagues, who you presumably like collaborating with to reach a goal, asks you to?
If a someone wants me to do a certain thing in a certain way, they simply have to state it in terms of:
- some benefit they want to achieve
- some drawback they want to avoid
- as little as an acknowledged unexamined preference like "hey I personally feel more comfortable with approach X, how bout we try that instead"
I'm happy to learn from their perspective, and gladly go out of my way to accomodate them. Sometimes even against my better judgment, but hell, I still prefer to err on the side of being considerate. Just like you say, I like to work with people in terms of a shared goal, and just like you do, in every scenario I prefer to assume that's what's going on.
If, however, someone insists on certain approaches while never going deeper in their explanations than arbitrary non-falsifiable qualifiers such as "best practice", "modern", "clean", etc., then I know they haven't actually examined those choices that they now insist others should comply with. They're just parroting whatever version they imagine of industry-wide consensus describes their accidental comfort zone. And then boy do they hate my "make your setup assume less! it's the only way to be sure!". But no, I ain't reifying their meme instead of what I've seen work with my own two.
You're moving the goalposts of this discussion. The guy I was responding to said "fix your setup" to another person saying "Your table wrapped for me. The short line equivalent looks best on my screen." That's a stated preference based on a benefit he'd like to achieve.
We are not discussing "best practice" type arguments here.
Working together with others should not mean having to limit everyone to the lowest common denominator, especially when there are better options for helping those with limitations that don't impact everyone else.
Just use descriptive variable names, and break your lines up logically and consistently. They are not mutually exclusive, and your code will be much easier for you and other people to read and edit and maintain, and git diffs will be much more succinct and precise.
Because "I" might be older or sight-impaired, and have "my" font at size 32, and it actually fills "my" (wider than yours) screen completely?
Would you advise me to "fix my eyes" too? I'd love to!
"Why should I accommodate others" is a terrible take.
80-column line lengths is a pretty severe ask.
setup_spi(
&adc,
mode=SPI_01,
rate=15,
cs_control=CS_MUXED,
cs=0x01
);
setup_spi(
&eeprom,
mode=SPI_10,
rate=13,
cs_control=CS_MUXED,
cs=0x02
);
setup_spi(
&mram,
mode=SPI_10,
rate=50,
cs_control=CS_DIRECT,
cs=0x08
);
ftfyHowever, it is the formatting I adopt when forced to bow down to line length formatters.
This is why a Big Dictator should just make a standard. Everyone who doesn't like the standard approach just gets used to it.
setup_spi(&adc,
mode=SPI_01,
rate=15,
cs_control=CS_MUXED,
cs=0x01
);
setup_spoo(&adc,
mode=SPI_01,
rate=15,
cs_control=CS_MUXED,
cs=0x01
);
setup_s(&adc,
mode=SPI_01,
rate=15,
cs_control=CS_MUXED,
cs=0x01
);
validate_and_register_spi_spoo_s(&adc,
mode=SPI_01,
rate=15,
cs_control=CS_MUXED,
cs=0x01
);
setup_spi(&adc, mode=SPI_01, rate=15, cs_control=CS_MUXED, cs=0x01);
setup_spi(&eeprom,
mode=SPI_10,
rate=13,
cs_control=CS_MUXED,
cs=0x02);
setup_spi(&mram, mode=SPI_10, rate=50, cs_control=CS_DIRECT, cs=0x08);
The pain point you describe is real, which is why that was intentionally added as a feature.
Of course it requires a language that allows trailing commas, and a formatter that uses that convention.
In a post-modern editor (by which I mean any modern editor that takes this kind of thing into consideration which I don't think any do yet) it should be possible for the editor to determine similarity between lines and achieve a tabular layout, perhaps also with styling for dissimilar values in cases where the table has a higher degree of similarity than the one above. Perhaps also with collapsing of tables with some indicator that what is collapsed is not just a sub-tree but a table.
setup_spi(&adc, mode=SPI_01, rate=15, cs_control=CS_MUXED,
cs=0x01);
setup_spi(&eeprom, mode=SPI_10, rate=13, cs_control=CS_MUXED,
cs=0x02);
setup_spi(&mram, mode=SPI_10, rate=50, cs_control=CS_DIRECT,
cs=0x08);
of there the short-line alternative presented.I like short lines in general, as having a bunch of short lines (which tend to be the norm in code) and suddenly a very long line is terrible for readability. But all has exemptions. It's also very dependent on the programming language.
But are there more examples? May be it's not high price to pay. I'm using either second or third approach for my code and I never had much issues. Yes, first example is pretty, but it's not a huge deal for me.
1) Horizontal scrolling sucks
2) Changing values easily requires manually realigning all the other rows, which is not productive developer time
3) When you make a change to one small value, git shows the whole line changing
And I ultimately concluded code files are not the place for aligned tabular data. If the data is small enough it belongs in a code file rather than a CSV you import then great, but bothering with alignment just isn't worth it. Just stick to the short-line equivalent. It's the easiest to edit and maintain, which is ultimately what matters most.
Unless they have been a thing since the start of a project; existing code should never be affected by formatters, that's unnecessary churn. If a formatter is introduced later on in a project (or a formatting rule changed), it should be applied to all code in one go and no new code accepted if it hasn't passed through the formatter.
I think nobody should have to think about code formatting, and no diff should contain "just" formatting changes unless there's also an updated formatting rule in there. But also, you should be able to escape the automatic formatting if there is a specific use case for it, like the data table mentioned earlier.
I’ve never understood why people care so much about the linter. Just let people write code and don’t worry about the linter. I don’t need to fight a linter which makes my code worse when I could just write it in a way that doesn’t suck. I promise it’ll be fine. I’m too busy doing actual software engineering to care if code is not perfectly formatted to some arbitrary style specification.
I feel like style lingers are horseshoe theory. Use them enough and eventually you wrap back around to just living without them.
I find linters make me faster. Sometimes I’m feeling lazy and I just want to pump out a bunch of lines of ugly code with mappings poorly formatted, bad indents, and just have it all synched up when I save.
I write perfectly legible code. More legible than a linter infact. Because the rules for what is ideal are not so simple as to be encoded in simple lint rules. Sure it gets like 95%. But the last 5% is so bad it ruins the positives.
If your goal is “code that is easy to read and understand” then a linter is only maybe the first 20%. Lots of well linted code is thoroughly inscrutable.
I'll gladly pay the price of making the one person's code worse if it improves the other nineteen's.
I 100% believe you. And for god's sake please use linter.
British and American spelling are both 100% legible English. But when multiple people coauthor a book, they should stick to one instead of letting each author use their favorite spelling.
99% the linter is not enforcing correctness in my experience. It's just enforcing a bunch of subjective aesthetic constraints. Which import order, max number of empty lines between statement, what type of string literal to use, no trailing white space, etc. A non trivial part of my day is spent dealing with this giant catalog of dinner etiquette. Not all of it is auto fixable. Also, there are plenty of situations where everyone would agree that violating the rule is necessary (eg. "no use before define" but you need mutual recursion). Also sometimes rules are circularly in conflict (eg you have to change a line but there is no way to do it without violating the max-line-length rule).
Linters enforcing rules that need to be broken is a pet peeve of mine, and I agree with you there. Most linters allow for using comments to explicitly exclude certain lines from being linted. This should ~never be necessary. If it is regularly necessary, then either you're programming bad (always a possibility!) or the rule has too many false positives and you should remove it.
To be frank, everyone I've worked with that complained about the linter didn't know much about their tooling. They didn't know about the fix command (even though I put it in the readme and told them about it), they didn't know how to turn on lintfix and prettier on save, wouldn't switch on git hooks and didn't know their lint failed until GitHub said so, and none of the people like this were so productive that it made up for this trait.
There's a python linter named `black` and it converts my code:
important_numbers = {
"x": 3,
"y": 42, # Answer to the Ultimate Question!
"z": 2
}
into this: important_numbers = {"x": 3, "y": 42, "z": 2} # Answer to the Ultimate Question!
This `black` is non-configurable (because it's "opinionated") and yet, out of some strange cargo cult, people swear by it and try to impose it on everybody.Why are you caring about formatting? Just write your code, get it working, let Black tidy it up in the standard way. Don't worry about the formatting.
In cases where you're annoyed about some choice the formatter makes, somebody else would be equally annoyed by the choice you would rather make. There is no perfect solution. The whole point is to have a reasonable, plausible default, and to automate it so that nobody has to spend any time thinking about it whatsoever.
Running a standard formatter when code is checked in minimizes the source control churn due to re-formatting. That churn is a pointless waste of time. If you don't run a standard formatter, I guarantee that badly-formatted code will make it into source control, and that's annoying.
There's a quote from Steve Jobs (or maybe his carpenter father):
“When you’re a carpenter making a beautiful chest of drawers, you’re not going to use a piece of plywood on the back, even though it faces the wall and nobody will ever see it. You’ll know it’s there, so you’re going to use a beautiful piece of wood on the back. For you to sleep well at night, the aesthetic, the quality, has to be carried all the way through.”
When you say "Don't worry about the formatting", what you're saying is "use a piece of plywood on the back," and I'm just not going to do that.I just honestly believe that if you fully automate the formatting, the results are better than if you do it painstakingly by hand; better by virtue of being more consistent. It's using the right tool for the job.
I don’t really care about whether the back is plywood or whatever. I don’t know how to write plywood code. I do care about creating clear, readable code that communicates my intent. Sometimes formatters help with that. Often they hinder, as they reflect the arbitrary aesthetic preferences of their creators.
important_numbers = {
"x": 3,
"y": 42, # Answer to the Ultimate Question!
"z": 2,
}
You might complain that that seems a bit obscure, but it only took me 10 or 20 seconds to discover it after pasting the original code snippet into an editor.The trailing comma is an improvement as it makes the diff clearer on future edits.
Edit to add: occurs to me that I oversimplified my position earlier and it probably looks like I'm trying to have it both ways. I do advocate aiming for clean and clear formatting; I'm just against doing this manually. You should instead use automation, and steer it lightly only when you have to.
For example, I explicitly don't want people to manually "tab-align" columns in their code. It looks nice, sure, but it'll inevitably get messed up in future edits. Better to do something simpler and more robust.
In the above example, if I think I have listed all of the `important_numbers`, there is a certain point of not having the trailing comma there.
Here's another terrible example from `black`:
From this:
my_print(f"This string has two parameters, `a` which is equal to {a} and `b` which is equal to {b}",
a=1, b=2)
To this: my_print(
f"This string has two parameters, `a` which is equal to {a} and `b` which is equal to {b}",
a=1,
b=2,
)
The trailing comma it added makes no sense whatsoever because I can not have an intent of adding more things -- I've already exhausted the parameters in the string!On the top of it, I don't quite get why I need to change the way I write in order to please the machine. Who should be serving whom?
Edit: changed "print" to "my_print" to not have to argue about named parameters of print ("sep", "file" etc.).
Edit 2: here's a variant that `black` has no issues with whatsoever. It does not suggest a trailing comma or any other change:
my_print(f"This string has two params, `a` which is {a} and `b` which is {b}", a=1, b=2)
So an existence of a trailing comma is a product of string length?Who's to say you don't add a new argument to the function in the future, like
my_print(
"This string has two parameters, `a` which is equal to {a} and `b` which is equal to {b}",
a=1,
b=2,
color_negative_red=True,
)
Don’t get me wrong: modern liners often annoy me and devs who spend a lot of time fiddling with those settings tend not to be very good programmers. But sometimes having guardrails is necessary.
Now you are bikeshedding. Just go with the defaults.
Note that, in my mind, this visualization is not automatically generated, but lovingly created by humans who wish their code to be understood by others. It is not separate from the code, as typical design documentation is, but an integral part of it, stored in metadata. Consider it an extension of variable and function naming.
There is of course "literate programming" [1], but somehow (improvements of) that never took off in larger systems.
I suppose this is because nobody has been able to create good tooling for it (the visualization itself, the efficient editing, etc). You'll have to deal with the text version of it at some point if not all tools that we rely on get a version for the new visualization.
Another hypothesis is that it might not matter this much that we work with text directly after all.
> Note that, in my mind, this visualization is not automatically generated, but lovingly created by humans who wish their code to be understood by others.
If you allow manual crafting there, I suspect you'll need some sort of linting too.
I really wish we lived in a universe where a lisp became the lengua franca of the world instead of javascript, as almost happened with Netscape, but alas ...
Virtually all programming languages are parsed into ASTs, and these ASTs can be serialized back. This is what formatters/"prettifiers" usually do.
Did I miss something?
My guess is it is the same reason why the most common form of creating source code is typing and not other readily available mechanisms:
Semantic density
Graphical visualizations are approachable representations and very useful for introductory, infrequent, and/or summary needs. However, they become cumbersome when either a well-defined repetitive workflow is used or usage variations are not known a priori.An example of both are the emacs and vi editors. The vast majority of supported commands are at most a few keystrokes and any programming language source code can be manipulated by them.
We must include the standard I/O definitions, since we want to
send formatted output to stdout and stderr.
<<Header files to include>>=
#include <stdio.h>
@
Not hard to see why nobody really embraced it. And not helped buy the fact that it was published right around the time that best practice was switching toward "don't comment unless absolutely necessary".Source code formatting programs are not the same as lint[0] programs. The former rewrites source code files such that the output is conformant with a set of layout rules without altering existing logic. The latter is a category of idempotent source code analysis programs typically used to identify potential implementation errors within otherwise valid constructs.
Some language tools support both formatting and source code analysis, but this is an implementation detail.
They slyly add git noise and pollute your audit trails by just going through and moving shit around whenever you save a file.
And sometimes, they actually insert bugs - string formatting errors are my favorite example.
It's for people who think good code is a about adhering to aesthetic ideologies instead of making things documented and accountable.
This is most noticeable in open source contributions. Sometimes I'll get a pull request with like 2 lines of change and 120 lines of some reformating tool.
You think I accept that?
It's not a good idea
This wouldn't happen nearly as much if you had a defined set of formatting rules plugged into CI instead of chaos
This only happens because the file doesn't already adhere to the rules it's implementing. These are normally highly configurable, and once your code complies to a standard, the tool prevents future code from pulling you away from that standard.
> And sometimes, they actually insert bugs - string formatting errors are my favorite example.
Do you have a concrete example?
> Sometimes I'll get a pull request with like 2 lines of change and 120 lines of some reformating tool.
Is your existing code formatting at least consistent?
> You think I accept that?
This is a social issue rather than a technical one. You can tell people in your development readme to use specific style rules, or even a project-wide precommit hook. If your own code is formatted with one of these tools, you can even (to my understanding) set up automated checks on GitHub's side.
But of course you are free to reject any PR you want.
Also, there are many linters that also do formatting, blurring the "line" you're pointing at.
Yes both git and all these PL are actually damn stupid to take lines at face value instead of something more elegant like Ada does. In my 20+ year career I've been proposed only once a project that involved Ada.
It's hard to come with something elegant and efficient. It's even harder to make it reach top tiers global presence, all the more when the ecological niche is already filled with good enough stuff.
[
'apple',
'banana',
'orange',
]
has an advantage over [
'apple',
'banana',
'orange'
]
Because adding a new line at the end of the table (1) requires editing 1 line, instead of 2 (2) makes the diffs in code review smaller and easier to read and review. So a bad choice makes my life harder. The same applies to local variable declarations.Sorted lists (or sorted includes) is also something that makes my life easier. If they're not sorted then everyone adds their new things to the end, which means there are many times more merge conflicts. sorted doesn't mean there are zero but does mean there are less than "append to the end". So, just like an auto-formatter is there to save time, don't waste my time by not sorting where possible.
Also, my OCD hates inconsistency. So
[1, 2, 3]
{a, b, c}
Is ok and [ 1, 2, 3 ]
[ a, b, c ]
Is ok but [1, 2, 3]
{ a, b, c }
Is not. I don't care which but pick ONE style, not two styles!And if there is something more important, then instead of of micro-optimizing the rules when there is strong disagreement it’s probably best if one of the parties takes the high road and lives with it so you can all focus on what matters.
Not to mention the overhead of running these worthless inefficient tools on every commit (even locally).
Tools like this just raise the debate from different opinions about formatting to different opinions about workflows. Workflows impact productivity a lot more than formatting.
[ 'apple'
, 'banana'
, 'orange'
]
But the scale of technical debt this insight has revealed is depressing.
[
, a
, b
, c
]
If only there existed a language designer intelligent enough to support it key:
- a
- b
- c
I'm just suddenly slightly terrified someone's going to see this and think it's genuinely a good idea and make it part of the next popular scripting language, where lists are defined by starting commas or something :S
All that really matters is consistency. Let a team make some decisions and then just move forward.
I’ve seen companies with such a large amount of developer churn that literally one person was left defending the status quo saying “we do X here, we voted on it once in 2019 and we’re not changing it just for new people”. 90% of the team were newcomers.
(The better teams I’ve worked on maintain a core set of leaders who are capable of building consensus through being very agreeable and smart. Gregarious Technocracy >> Popular Democracy!)
Not so! Amount of tokens correlates to perceived code complexity to some. One example is how some people can't unsee or look past lisps parenthesis.
Another example is how some people get used to longDescriptiveVariableNames but others find that overwhelming (me for instance) when you have something like:
userSignup = do
let fullName = userFirstNameInput + userLastNameInput
userName = take 1 userFirstNameInput + take 10 userLastNameInput
saveToDB userName
Above isn't bad, but imagine variables named that verbosely used over and over, esp in same line.Compare it to:
userSignup = do
let fullName = firstName + lastName
userName = take 1 firstName + take 10 lastName
saveToDB userName
The second example loses some information, but I'd argue it doesn't matter too much given the context one would typically have in a function named `userSignup`.I've had codebases where consistency required naming all variables like `firstNameInputField` rather than just `firstName` and it made functions unreadable because it made the unimportant parts seem more important than they were simply by taking up more space.
This judgement is rather based on a strong personal opinion (which I don't claim to be wrong, but also not as god-given) on what is one, and what are two changes in the code:
- If you consider adding an additional item to the end of the list to be one code change, I agree that a trailing comma makes sense
- On the other hand, it is also a sensible judgment to consider this to be a code change of two lines:
1. an item (say 'peach') is added to the end of the list
2. 'orange' has been turned from the last element of the list to a non-last element of the list
If you are a proponent of the second interpretation, the version that you consider to be non-advantageous is the one that does make sense.
first item,,,
second item,,
third item,
fourth item
In my experience, special treatment for the last item is rarely warranted, so a trailing comma is a good default. If you want the last item to be special, put a comment on that line, saying that it should remain last. (Or better yet, find a better representation of your data that does not require this at all.)There do exist reasons why this can make sense:
- In an Algebraic Data Type implementation of a non-empty list, the last symbol is a different type constructor than the one to append an item to the front of an existing non-empty list (similarly how for an Algebraic Data Type implementation of an arbitrary list, the type constructor for an initial empty list is "special").
- In a single-linked list implementation, sometimes (depending on the implementation) the terminal element of the list is handled differently.
---
By the way: at work, because adding parameters at the beginning of a (parameter) list of a function is "special" (because in the code for many functions the first parameters serve a very special purpose), but adding some additional parameter at the end is not, we commonly use parameter lists formatted like
'foo'
, 'bar1'
, 'bar2'
, 'blub'
Then why not consider it four changes?
3. 'banana' has been turned from the last-but-one element of the list to the last-but-two element of the list
4. 'apple' has been turned from the last-but-two element of the list to the last-but-three element of the list
Mine hates trailing commas :)
More seriously, I don't like having lists like that in the code in the first place. I don't want multiple lines taken up for just constant values, and if it turns out to require maintenance then the data should be in a config file instead anyway.
And there's no centralized idea on best practices.
Recently, I discovered that the ruff linter for Python doesn't like the assert statement, because since it does nothing in "optimized" mode it isn't reliable. But such complaints about unit tests are not particularly useful.
[tool.ruff.lint.per-file-ignores]
"tests/*" = ["S101"]
(Besides, this was about formatting, not linting, but I realize it's related.)When it comes to formatting, there's other languages (Go, Python?) that have clear, top-down guidelines applied by tooling, at least for code style. I think that's clever, and besides the odd mailing list post trying to change it because of a personal preference, it minimizes discussions about trivialities over the really important things.
Because 2 vs 4 spaces or line length discussions are ultimately futile; those aren't features, individual preferences don't matter. Codebases have millions of lines and thousands of developers; individual opinions do not matter at scale, consistency does.
If you write and edit and read and search code every day, code formatting is rather important.
What the pattern is doesn't really matter.
Took me a sec, but well played
>First off, I’d suggest printing out a copy of the GNU coding standards, and NOT read it. Burn them, it’s a great symbolic gesture.
https://www.kernel.org/doc/html/v4.10/process/coding-style.h...
I think you just answered your own question ;-)
See my other comment: https://news.ycombinator.com/item?id=45166670
1. Assuming at least one person who cares about linter settings isn't utterly confused or moronic, what are their self-described reasons why they care? People's work styles, brains, and even sensory perception differ in some important ways!
2. As freedom-loving developers [1] who want to make our own choices to help our own styles of work, why should we even have to care about "enforcing" one standard for something that isn't really necessary? This one-standard-per-project thing is a downstream result of a design decision upstream (storing source code as plain text).
3. How should we design languages going forward? This brings the conversation back to top-level post (which is why we're here -- to think about what languages could be, not to rehash tired old debates, after all): how can we take what we've learned and build better languages -- perhaps ones where the primary source of truth for source code is not plain text?
[1] Slightly tongue-in-cheek. It is one thing to want to have freedom to do our jobs well, it is another thing to turn this into advocacy an overarching system such as a political philosophy or various decentralized financial mechanisms and so on. Here, I'm merely referring to the "let me do my job in the way that actually works for my brain" sense.
I can still live with it. And I like the clean, minimal version when I don’t have to edit. Just adding that “style” can have impact beyond how it looks involving ease of editing. And it stinks when your preferences clash with the community.
Yes, I can get used to other layouts, but that by no means means all layouts are equal to me in terms of how readable they are, and how well things stand out when they should, or blend in when they should.
I recognise this isn't the case for everyone - some people read code beginning to end and it doesn't matter how its laid out. But I pattern match visually, and read fragments based on layout, and I remember code based on visual patterns.
Ironically, because I have aphantasia, and don't visualise things with my "minds eye", but I still remember things by visual appearance and spatial cues better than by text.
I thought I was very smart. Like, really really smart, maybe the smartest programmer in the team.
And as such my opinion was very important. Maybe the most important opinion in the team. Everyone had to listen to it!
That is all. Also, I was wrong.
This is probably the only useful takeaway, but can you explain why you were wrong?
First and foremost I was wrong thinking that I was smarter than others — that's not even how intelligence works.
Second I was wrong being so stubbornly pro-tabs / anti-spaces (for example). It doesn't make that much of a difference, so there's no point in being so passionate about it.
And third I was wasting everyone's time (and my persuasion powers) by not choosing my battles more wisely.
My suggestion would be nowadays: let's choose a popular style guide, set up a linter and be done with it.
Arthur Witney formats like this:
C vt[]="+{~<#,";
A(*vd[])()={0,plus,from,find,0,rsh,cat},
(*vm[])()={0,id,size,iota,box,sha,0};
If your code was formatted automatically like that, do you think you'd get used to it after a week?My point is there is meaning of how code is formatted and there is an effect on understanding for certain people.
I think that at a certain point of "reasonable" and for most "normal" people your statements hold true, but I don't want anyone to think that every person caught up on formatting is just doing it for bike-shedding or other trivial reasons.
I don't know what is actionable if what I say is true, but it feels important to say.
Another argument that is a pet peeve of mine is significant white-space vs curly braces. It literally doesn't matter. We often get new Python developers coming from a C# background and the amount of bitching about curly braces is so annoying. Just learn the language bro, it's not that hard.
This, however, usually doesn't effect me if the official format for a project is one way or the other because [drumroll] I just format my tree differently and then format to the official style when I push.
Here’s an old video of JetBrains MPS rendering a table from code https://www.youtube.com/watch?v=XolJx4GfMmg&t=63s
I’m hoping for an IDE able to render dictionaries as tables -- my wishlist doesn’t stop there.
Currently, we have a glimpse of those features, such as code folding, inlay hints, or docstrings rendered as HTML:
Black is great, but maybe it's just me since it aligns with how I like the code formatted.
Would there be any downsides for python (or git ?) to define a standard way of formatting to save a valid file, and all the formatting necessary to read a file happens in the IDE showing the file ?
That would very much fit with python ethos 'There should be one-- and preferably only one --obvious way to do it.'
I can't see a crazy huge downside from a python point of view, but seems like a much bigger upside than flexible formatting would be needed to justify breaking from all of that stuff.
1. The developer has enough experience to understand that formatting matters.
2. The developer has enough discipline to stick with their chosen formatting rules.
3. The developer has the taste necessary to choose good formatting rules.
4. The developer has the judgement necessary to identify when other concerns justify one-off violations of the rules.
These are really important attributes for a developer to have. They affect every aspect of the code, not just formatting. Formatting is just a very quick proxy to measure those by.
Unfortunately, things like autoformatting and linter rules are destroying the signal. Goodheart's law strikes again.
- they have probably never worked on a codebase where files are edited by more than 1 person
- they have never done any significant amount of merging between branches
- they have never maintained a large codebase
- they have never had to refactor a large codebase
- they don't use diff/comparison tools to read the history of their codebase
- they have never written any tooling for their codebase
- they are not good team-players and/or only care about their own stuff
To go through the details: The post explicitly complained about a linter enforcing style rules. It did not object to the presence of mechanically-enforced style rules. In fact, it glorified them implicitly by saying how great it would be if everything was formatted at presentation-time. This glorification is the exact thing I was criticizing.
I think machine-enforced rules are bad because they destroy a communication channel that importantly has point 4 that I listed - when well-formatted code breaks its conventions, there must be a reason for it. That is important information that enforced presentation rules force to be put into another channel.
And it's certainly true that other channels do convey this other information, but I find more value in having it conveyed in the presentation channel than I do in having that channel replaced by mechanistic formatting.
This is the premise underlying the article that I object to. It is present so heavily in the subtext that if you pretend it's not, the post becomes incoherent.
And FWIW, HN rules say not to accuse people of not having read the article. I think that rule is mostly there because someone can read the article and notice something you missed, and it's wiser to not post than it is to assume you absorbed 100% of the context of the post.
Furthermore, instead of nitpicking over small details, it can actually be a good idea to just leave everything on default, forgo whatever your individual style might be and stick to what's been deemed to be good enough as the default - so the code will look more familiar to anyone who picks it up (and has used the tools you use for linting and formatting). Yes, formatting is different from linting; though if you set up one, you might as well do the other.
In my very limited experience, I learned the importance of penmanship in that profession.
In my much larger experience since, I've learned the irrelevance of penmanship to writing code. I don't practice my blueprint handwriting anymore. It would be wholly unfit-for-purpose without a bunch of practice. But I understand its value in that context.
If I understand the thrust of your comment correctly, you're pointing towards removing formatting as a channel being a net positive, despite the loss of all these indicators. I might almost agree with that, except for my point 4. Sometimes it's better, on the whole, to break conventions. Mechanical formatting systems cannot make these judgement calls.
I think the minor friction of explicit formatting is a net positive. I think the communication channel it adds carries more value than the friction it imposes hurts. (And I'm calling it explicit formatting because it doesn't have to be manual - it just has to be done with intention, judgement, and approval.)
I don't think the massive friction imposed by submitting code as ink on paper provides enough value to be worth its costs, by contrast.
It's talking about the Ada programming language and that its code was apparently stored not as plaintext but an intermediate representation (IR) that could then be transformed back into code.
So formatting was handled by tooling by the nature of the setup. Developers would each have their own custom settings for "pretty printing" the code.
The author isn't saying don't use code formatters. They're highlighting an unusual approach that the industry at large isn't aware of. Instead of getting rid of arguments about code style via formatters, you can get rid of them by saving code in an IR instead of plaintext.
re: intermediate representation and projectional editing: yes, editors are now getting better at helping you refactor code (rename function in language XYZ is possible in language servers for IDEs, /no AI required, it works better when a human coded AST tool does it/)
projectional editors aren't around /because the more complex parts of it are harder/ - BUT - I could definitely see more intelligent refactor tooltips written by humans.
For example: in Rust, if I've been passing a pointer vs borrowing (or whatever), pattern A for most of my code, then pattern B and it complains, it would be useful to have a tooltip that goes "do you want to refactor all the other references/parameters to pattern B" instead of Rust's default "this function isn't using pattern A" borrow checker error.
https://git-scm.com/book/pt-br/v2/Customizing-Git-Git-Attrib...
There's nothing special about whitespace (unless you write python).
Capitalization and a bunch of other stuff in your coding convention document are usually just signs that you have poor tooling and lack of skill.
Give me a PR that satisfies the requirements and the appropriate test cases and i'll happily rewrite it to spaces only indented with curly braces on newlines and etc... as I see fit.
The hard part is the first two tasks, you can train an intern to do the third
What about comments? Were they part of the IR?
(I agree with others that version control, grep etc. are also very important, and kind of a deal breaker).
btw: have a look at how much disdain was reserved for systemd and its pletora of binary blobs + custom tools (e.g. the journal stuff) ... and that was basically forced upon from the distributions
I'm mainly just being pedantic to be honest, I realise my comment is just me essentially saying "what could possiblye go wrong?"
The bigger problem is you now need custom tooling for your IDE, version control, diff & merge, code review, code hosting, etc. etc.
Raw text is amazing at smaller scales. The ability to apply a bunch of intermediate incorrect transformations to reach a valid destination is invaluable (like doing a bunch of hacky find/replace).
Projectional editors like JetBrains MPS have tons of disadvantages vs text, and the few advantages don't make up for it.
Formatting is a silly problem to have, but far beyond that why are we manipulating text files directly rather than editing a live program (ala Smalltalk). Text can just be the on-disk serialization format you never look at.
(Raw text is still how you edit individual functions and methods in Smalltalk, there just isn't any actual text file on disk)
Consider the following (pseudo-)code example:
bar.glob = 1;
bar.plu.a1 = 21;
bar.plu.coza = fol;
Should this code formatted this way? Or should it be formatted bar.glob = 1;
bar.plu.a1 = 21;
bar.plu.coza = fol;
to emphasize that three assignments are done?Or should this code be formatted
bar.glob = 1;
bar.plu .a1 = 21;
bar.plu .coza = fol;
to bring make the "depth" of the structure variables more tabular so that you can immediately see by the tabular shape which "depth" a member variable has?We can go even further like
bar.glob = 1;
bar.plu.a1 = 21;
bar.plu.coza = fol;
which emphasizes that the author considers it to be very important that the reader can easily grasp the magnitudes of the numbers involved (which is why in Excel or LibreOffice Calc, numbers are right-aligned by default). Or combining this with making the depth "tabular": bar.glob = 1;
bar.plu .a1 = 21;
bar.plu .coza = fol;
Each of these formattings emphasizes different aspects of the code that the author wants to emphasize. This information cannot be deduced from some abstract syntax tree alone. Rather, this needs additional information by the programmer in which sense the structure behind the code intended by the programmer is to be "interpreted".Storing the AST instead of the text is a lossy encoding, but would we lose something more valuable than what we gain? If your example is the best thing we’d lose - i’d say it’s still net a massive win.
and there are ways to emphasize different parts, that would survive the roundtrip to AST. E.g. one way to emphasize depth:
setValue([bar, glob], 1)
setValue([bar, plu, a1], 21)
or to emphasize the data: configure(bar, 1, 21, fol)
Or heck you could allow style overides if you really wanted to preserve this kind of styling: // $formatblk: tabular_keypaths, aligned_assignments
bar .glob = 1
bar .plu .a1 = 21
// $formatblk-end
To have something that sometimes checks the types and some times does is not a feasible solution.
Code should be generally written so it's easy to read.
https://github.com/airbnb/javascript/issues/1271
https://github.com/airbnb/javascript/issues/1122
I literally spent over an hour when adapting an existing project to use the airbnb config, when code was perfectly correct, clear and maintainable. I ended up disabling those specific rules locally. I never used it in another project. (Looks like the whole project is no longer maintained. Good riddance.)
The airbnb config is, in my view, the perfect example of unnecessarily wasting people's productivity when linting is done badly.
"It must have been good because Grady Booch says so".
Refactorings (when done right) are syntax tree transformations that preserve things like referential integrity, etc. that ensure code does the same thing before and after applying a refactoring.
A rename becomes trivial if you are simply working on the symbol directly. For that to work with file based source trees, you need to parse the whole thing, keep track of where symbols are referred in files, rename the symbol and then update all the places in the source tree. That stuff becomes a lot easier when the code representation isn't a bunch of files but the syntax tree. The symbol just gets a different name. Anything that uses the symbol will still use the same symbol.
People like editing files of course and that has resulted in a lot of friction developing richer tools that don't store text but something that preserves more structure. The fact that we're still going on about formatting issues a quarter century later maybe shows that this is something to revisit. For many languages and editors, robust symbol renames are still somewhat science fiction. And that's just the most basic refactoring.
> That stuff becomes a lot easier when the code representation isn't a bunch of files but the syntax tree
You are just mixing abstraction layers here. That syntax tree still needs to be stored in file(s) somehow, and nothing prevents having syntax tree aware (or smarter) tooling operating on human readable files. Basically deserializing AST and parsing source code are the same thing. The storage format really isn't that significant factor here.
So what is needed is better tools rather than fiddling with storage format. Microsofts Roslyn is obvious example, but plenty of modern compilers are moving in the direction of exposing APIs to interact with the codebase.
Sure, but there are less flaky ways than spreading a syntax tree across files. Visual Age actually used a database for this back in the day. Smalltalk did similar things by storing code in an image file that contained both byte code and method definitions. You could export source code if you wanted. But wouldn't do that while developing typically. That's not an approach that caught on. But it has some advantages.
What you are describing is what Eclipse did with Java. Eclipse was the successor to Visual Age. The Eclipse incremental compiler for Java updated an internal data structure for the IDE. It could do neat things as partial compilation to enable running tests even in the presence of some compile errors. It also was really fast. By the time you stopped typing, it would have already compiled your code. Running the tests was similarly fast.
The problem of syncing a tree of source files with an AST is just a bit hard. Intellij never came close to this and has always had lots of trouble keeping its internal caches coherent. There's even a top level "invalidate caches" option in the File menu (still there, I checked. Right next to the Repair IDE option). They were off by 2-3 orders of magnitude. Seconds (at best) instead of milliseconds. I still miss Eclipse's speed every day I use Intellij.
Some compilers are taking some steps to supporting more advanced IDEs. But there aren't a lot of those beyond what Jetbrains provides. VS Studio Code support varies between different languages. But mostly it's very limited on this front. The Rust compiler is one of those. Though I don't know the current state of that. Mostly it's not well known for its blazing performance (the compiler). I'm not sure if Jetbrains leverages many of those features in its Rust IDE (I'm not a Rust developer).
Imagine Java if you could…
na com.mycompany.myapp;
pu cl MyClass {
pro sta i = 42;
pri fi ch[] MAGIC = ['a', 'b'];
pu sta v main(String[] args) {
OtherClass otherClass = n OtherClass();
f (i i = 0; i < MyClass.i; i++) {
otherClass.hex(i, this.MAGIC);
}
}
}
10 ? "Hello"
20 gO 10
and a LIST command would yield 10 print "Hello"
20 goto 10
So saving commands as tokens in memory and formatting them on output was somewhat common back then.The speccy was more advanced in terms of this (as mentioned in the parent comment), and it had the better BASIC for sure.
With modern tools it it is easy to add formatting on saving or on commit. So I don't understand what's the fuss about.
At the same time, for the most important tool in software engineering, Git, it matters which lines are changes. And it is better to only see actual logic changes, not swamped in tabs vs space or other parts that are just formatting.
That said, I would love to see more of this splitting between actual internal representation and view. Don't like anything in style guide (or even syntax alike curly brackets vs indentions) - just change view, alike folding.
There are most likely good reasons why Ada and DIANA are not in widespread use.
You have to get everyone set up to use it, whereas everyone is already, of necessity, set up to use plain text.
And we aren't all using the same programming language and the same hardware setup.
Thus, specifically:
* everyone has to agree on an IR standard; if it can't accommodate every programming language, then there needs to be coverage for all the programming languages, and a way for software systems to know which one to use
* everyone has to have local software that can convert back and forth (they can't just rely on something built in to the "development system", I assume burned into a ROM)
* everyone's version control setup has to invoke that software as a commit hook
* the IR has to be designed in a way that allows for meaningful diffs, and the version-control software needs to be aware of how to diff and patch (which potentially also means a new standard for diff files)
<picks pop-corns>
I can write code that (IMHO) is substantially better than any formatter. But I've realized that there is no way to make other people on a team have the same opinions and skill as me, so I accept automatic code formatters.
I've never found a single formatter that formats my way though...
A. not everyone on your team is using prettier
B. not everyone is using the same config/agrees on what it should be
e.g. if the formatter is really shifting stuff around, your code might be too nested - if you have a compiler, let it take the strain.
kelseyfrog•17h ago
There's a scissor that cuts through the formatting debate: If initial space width was configurable in their editor of choice, would those who prefer tabs have any other arguments?
Avshalom•17h ago
the unix philosophy on the other hand only "thrives" if every other tool is designed around (and contains code to parse) "plain text"
lmm•15h ago
And how did that work out for them?
This seems like one of the many cases where unix won out by being a lowest common denominator. Every platform can handle plain text.
aleph_minus_one•8h ago
The lowest common denominator rather is binary blobs. :-)
thfuran•4h ago
account42•8h ago
jsharpe•17h ago
The goal of having every developer viewing the code with their own preferences just isn't that important. On every team I've been on, we just use a standard style guide, enforced by formatter, and while not everyone agrees with every rule, it just doesn't matter. You get used to it.
Arguing and obsessing about code formatting is simply useless bikeshedding.
scubbo•16h ago
spyspy•16h ago
duskwuff•14h ago
rbits•15h ago
mdaniel•11h ago
https://astyle.sourceforge.net/astyle.html#_style=whitesmith
And then someone said: oh yeah? Hold my beer https://astyle.sourceforge.net/astyle.html#_style=pico
masklinn•10h ago
Buttons840•13h ago
Unless it's an accessibility issue, and it is an accessibility issue sometimes.
mmastrac•13h ago
raspasov•11h ago
Bah! So, what is more important? Is the average convenience of the herd more important? Average of the convenience, even if there was ever such a thing.
What if you really liked reading books in paper format, but were forced to read them on displays for... reasons?
cowsandmilk•17h ago
What I would be curious on is tracing from errors back to the source code. Nearly every language I’ve used prints line number and offset on the line for the error. How that worked in the Diana world would be interesting to learn.
sublinear•16h ago
peanball•11h ago
[1]: https://github.com/Wilfred/difftastic
accelbred•16h ago
If we had a formatting tool that operated solely on AST, checked in code could be in a canonical form for a given AST. Editors could then parse the AST and display the source with a different formatting of the users choice, and convert to canonical form when writing the file to disk.
sublinear•16h ago
pwdisswordfishz•13h ago
michaelmrose•12h ago
jitl•12h ago
hnlmorg•10h ago
If we can’t progress our ecosystem because we are reliant on one very specific 50+ year old line parser, then that says more about the inflexibility of the industry to move forward than it does about the “new” ideas being presented.
komali2•10h ago
Grep works great.
account42•8h ago
theamk•1h ago
So the real choice is either:
- new tool: grep with caching reverse-formatter filter.
- new tool: ast-grep with understanding of AST serialization format for your specific language.
At least in the first case, you still have fall back.
pmontra•9h ago
About grep and diff working on a textual representation of the AST, it would be like grepping on Javascript source code when the actual source code is Typescript or some other more distant language that compiles to Javascript (does anybody remember Coffescript?) We want to see only the source code we typed in.
By the way, add git diff to the list of tools that should work on the AST but show us the real source code.
charcircuit•16h ago
bee_rider•15h ago
teo_zero•12h ago
> Everyone had their own pretty-printing settings for viewing [DIANA] however they wanted.
bee_rider•11h ago
I’m still confused because the specifically call the IR DIANA, and they talk about viewing the IR. It isn’t clear to me if the IR is more like a bytecode or something, or more like just the original source code with a little processing done to it. They also have a quote,
> Grady Booch summarizes it well: R1000 was effectively a DIANA machine. We didn't store source code: source code was simply a pretty-printing of the DIANA tree.
So maybe the other visualizations they could do by transforming the IR were so nice that nobody even cared to look at the original ADA that they’d written to generate it?
brabel•11h ago
froh•13h ago
xslt was a Diana like pre-parsed representation of dsssl. oh how I miss dsssl (a scheme based sgml transformation language) but no. dsssl was a lisp! with hygienic macros! "ikes" they went and invented XSLT.
the "logic" escapes me to this day.
no. plain text it is. human readable. and grep/sed/diff able.
danielheath•12h ago
All the same tools can exist with a text backend, and you get grep/sed support for free too!
giveita•12h ago
This becomes an issue with say CI where maybe I add a gate to check something with grep. But whose format do I assume? My local (that I used to test it locally) or the canonical (which means I need to switch local format to test it)?
treadmill•11h ago
You would use the format on disk for the grep. "Your format" only exists displayed in your editor.
giveita•6h ago
brabel•11h ago
psychoslave•9h ago
eviks•12h ago
Yes, of course, because tab width is * dynamically* flexible, so initial space width isn't enough
pasc1878•6h ago
eviks•5h ago
But for "dirty-width" indents, eg, after some text that can vary in size (proportional fonts or some special chars even in fixed fonts) you can't align with spaces while a tab width can be auto-adjusted to match the other line
rendaw•11h ago
MyOutfitIsVague•11h ago
You still work with text, the text just isn't the canonical stored representation. You get diffs to resolve only when structure is changed.
You get most of the same benefit with a pre-commit linter hook, though.
bapak•8h ago
What happens when you stage the line `} else return {`? git doesn't allow to stage specific AST nodes. It would also mean that you can't stage partial code (that produces syntax errors)
Hendrikto•7h ago
You would still store text, and still check out text, just transformed text. You could still check in anything you want, including partial code, syntax errors, or any other arbitrary text. Diffs would work the same way they do now.
zokier•5h ago
account42•8h ago
zokier•5h ago
aleph_minus_one•8h ago
Perhaps this is rather a design mistake in how UNIX handles things and is so focused on text.
gr__or•8h ago
All of your examples work better for code with structural knowledge:
- grep: symbol search (I use it about 100x as often as a text grep) or https://github.com/ast-grep/ast-grep
- diff: https://semanticdiff.com (and others), i.e.: hide noisy syntax only changes, attempt to capture moved code. I say attempt, because with projectional programming we could have a more expressive notion of code being moved
- sed: https://npmjs.com/package/@codemod/cli
- version control: I'd look towards languages like Unison to see what funky things we could do here, especially for libraries. A general example: no conflicts due to non-semantic changes (re-orderings, irrelevant whitespaces, etc.)
zokier•7h ago
gr__or•3h ago
And there are abilities we lose completely by making text the source of truth, like a reliable version control for "this function moved to a new file".
theamk•1h ago
But if you store ASTs, you _have_ to have the support of each of the language for each of the tools (because each language has its own AST). This basically means a major chicken-and-egg problem - a new language won't be compatible with any of the tools, so the adoption will be very low until the editor, diff, sed etc.. are all updated.. and those tools won't be updated until the language is popular.
And you still don't get any advantages over text! For example, if you really cared about "this function moved to new file" functionality, you could have unique id after each function ("def myfunc{f8fa2bdd}..."), and insert/hide them in your editor. This way the IDE can show nice definition, but grep/git etc.. still work but with extra noise.
In fact, I bet that any technology that people claim requires non-readable AST files, can be implemented as text for many extra upsides and no major downsides (with the obvious exception of truly graphical things - naive diffs on auto-generated images, graphs or schematics files are not going to be very useful, no matter what kind of text format is used)
Want to have each person see it's own formatting style? Reformat to person's style on load and format back to project style on save. Modern formatters are so fast, people won't even notice this.
Want fast semantic search? Maintain the binary cache files, but use text as source-of-truth.
Want better diff output? Same deal, parse and cache.
Want to have no files, but instead have function list and edit each one directly, a la Smalltalk? Maintain files transparently with text code - maybe one file per function, or one file per class, or one per project...
The reason people keep source code as text as it's really a global maximum. The non-text format gives you a modest speedup, but at the expense of imposing incredible version compatibility pain.
gr__or•35m ago
I'm also not saying we can have all these good things, but they are not free, and the costs are more spread out and thus less obviously noticeable than the ones projectional code imposes.
rafaelmn•1h ago
Tooster•7h ago
* [Difftastic](https://difftastic.wilfred.me.uk/) — my go-to diff tool for years * [Nu shell](https://www.nushell.sh/) — a promising idea, but still lacking in design/implementation maturity
What I’d really like to see is a *viable projectional editor* and a broader shift from text-centric to data-centric tools.
The issue is that nearly everything we use today (editors, IDEs, coreutils) is built around text, and there’s no agreed-upon data interchange format. There have been attempts (Unison, JetBrains MCP, Nu shell), but none have gained real traction.
Rare “miracles” like the C++ --> Rust migration show paradigm shifts can happen. But a text → projectional transition would be even bigger. For that to succeed, someone influential would need to offer a *clear, opt-in migration path* where:
* some people stick with text-based tools, * others move to semantic model editing, * and both can interoperate in the same codebase.
What would be needed:
* Robust, data-native alternatives to [coreutils](https://wiki.archlinux.org/title/Core_utilities) operating directly on structured data (avoid serialize ↔ parse boundaries). Learn from Nushell’s mistakes, and aim for future-compatible, stable, battle-tested tools. * A more declarative-first mindset. * Strong theoretical foundations for the new paradigm. * Seamless conversion between text-based and semantic models. * New tools that work with mainstream languages (not niche reinventions), and enforce correctness at construction time (no invalid programs). * Integration of semantic model with existing version control systems * Shared standards for semantic models across languages/tools (something on the scale of MCP or LSP — JetBrains’ are better, but LSP won thanks to Microsoft’s push). * Dual compatibility in existing editors/IDEs (e.g. VSCode supporting both text files and semantic models). * Integrate knowledge across many different projects to distill the best way forward -> for example learn from Roslyn's semantic vs syntax model, look into tree sitter, check how difftastic does tree diffing, find tree regex engines, learn from S-expressions and LISP like languages, check unison, adopt helix editor/vim editing model, see how it can eb integrated with LSP and MCP etc.
This isn’t something you can brute-force — it needs careful planning and design before implementation. The train started on text rails and won’t stop, so the only way forward is to *build an alternative track* and make switching both gradual and worthwhile. Unfortunately it is pretty impossible to do for an entity without enough influence.
zokier•5h ago
https://docs.helix-editor.com/syntax-aware-motions.html
https://www.masteringemacs.org/article/combobulate-structure...
https://zed.dev/blog/syntax-aware-editing
Etc etc.
Tooster•3h ago
Without tools in mainstream editors I don't see how it can push us forward instead of saying a niche barely anyone knows about.
gorgoiler•3h ago
It’s a really subtle difference but I can’t quite put my finger on why it is important. I think of all the little text files I’ve made over the decades that record information in various different ways where the only real syntax they share is that they use short lines (80 columns) and use line orientation for semantics (lah-dee-dah way of saying lots of lists!)
I have a lot of experience of being firmly ensconced in software engineering environments where the only resources being authored and edited were source code files.
But I’ve also had a lot of experience of the kind of admin / project / clerical work where you make up files as you go along. Teaching in a high school was a great place to practice that kind of thing.
jrochkind1•3h ago
And yet it didn't, it reversed. I think the fact that "plain text for all source files" actually won in the actual ecosystem wasn't just because too many developers had the wrong idea/short-sightedness -- because in fact most influential people wanted and believed in what you say. It's because there are real factors that make the level of investment required for the other paths unsustainable, at least compared to the text source path.
it's definitely related to the "victory" of unix and unix-style OSs. Which is often understood as the victory of a philosophy of doing it cheaper, easier, simpler, faster, "good enough".
It's also got to do with how often languages and platforms change -- both change within a language/platform and languages/platforms rising and falling. Sometimes I wish this was less quick, I'm definitely a guy who wants to develop real expertise with a system by using it over a long time, and think you can work so much more effectively and productively when you have done such. But the actual speed of change of platforms and languages we see depends on reduced cost of tooling.
gr__or•3h ago
Ygg2•2h ago
Yes. Because Yaml exists. And mixing tabs and spaces is horrible in it. And the rules are very finnicky.
Optimal tab usage is emit 2-4 spaces.