Also syntax is the interface through which you interact with the language, so bad syntax is going to be something annoying you have to deal with constantly. Sure you'll be able to write good programs in a language with bad syntax choices, but it's going to be less fun.
> Odin’s rules, which are very similar to Python’s, are to ignore newline-based “semicolons” within brackets (( ) and [ ], and { } used as an expression or record block).
Honestly I always thought that was a bit crap in Python and I'm surprised anyone thought this was a sensible thing to copy. Really, just use semicolons. As soon as an "easy" rule becomes even vaguely difficult to remember it's better to bin it and just require explicitness, because overall that is easier.
Computer languages are for humans to understand and communicate.
1. https://www.eecg.utoronto.ca/~jzhu/csc326/readings/iverson.p...
For this reason (coming from C++) I wished Swift were more popular because that syntax is much more familiar/friendly to me, while also having better memory safety and quality of life improvements that I like.
If you do like Swift you might want to just bite the bullet and embrace the Apple ecosystem. That would be my recommendation I think.
Strangely enough I find Lisp's parentheses much more attractive.
Personally, I bucket C++ and Rust and Swift under "basically the same syntax." When I think about major syntax differences, I'm thinking about things like Python's significant indentation, Ruby's `do` and `end` instead of curly braces, Haskell's whitespace-based function calls, Lisp's paren placement, APL's symbols, etc.
Before today I would have assumed that anyone who was fine with C++ or Rust or Swift syntax would be fine with the other two, but TIL this point exists in the preference space!
I understand exactly how shallow that makes me sound, and I'm not about to try and defend myself.
For this reason I was able to get into Odin as opposed to Zig because of some similarities with Swift Syntax as well how easy it is to parse.
The less I need to rewire my brain to use xyz language, the greater the chance of me getting into it.
If my life depended on it, I could get over such a shallow reason to dismiss a language but fortunately it doesn't and that's why I write Swift rather than Rust.
I dislike the “you can change the syntax” argument because that just doesn’t happen. Closest thing is a new language that compiles to another.
the way python treats whitespace is a huge design mistake that has probably wasted like a century (if not more) worth of time across all users, on something really trivial.
(edited several times to try to correct changes in formatting for an example here, but it's just screwed up :-/ )
Contrast an Algol-descendant like C, Pascal, Java, or even Python with a pure functional language like Haskell. In the former, control structure names are reserved words and control structures have a distinct syntax. In the latter, if you see `foo` in the body of a function definition you have no idea if it's a simple computation or some sophisticated and complex control structure just from what it looks like. The former provides more clues, which makes it easier to decipher at a glance. (Not knocking Haskell, here; it's an interesting language. But it's absolutely more challenging to read.)
To put it another way, syntax is the notation you use to think. Consider standard math notation. I could define my own idiosyncratic notation for standard algebra and calculus, and there might even be a worthwhile reason for me to do that. But newcomers are going to find it much harder to engage with my work.
For what it's worth, Python has been moving away from this, taking advantage of a new parser that can implement "soft keywords" like 3.10's "match" statement (which I'm pretty sure was the first application).
Believe it or not, the motivation for this is to avoid reverse compatibility breaks. Infamously, making `async` a keyword broke TensorFlow, which was using it as an identifier name in some places (https://stackoverflow.com/questions/51337939).
In my own language design, there's a metaprogramming facility that lets you define new keywords and associated control structures, but all keywords are chosen from a specific reserved "namespace" to avoid conflicts with identifiers.
This is a blinkered viewpoint. If you want to talk about syntax, at least mention the Haskell family (Elm, Idris, F*, etc), Smalltalk, and the king of syntax (less) languages, LISP (and Scheme), which teach us that syntax is a data structure.
> Another option is to do something like automatic semicolon insertion (ASI) based on a set of rules. Unfortunately, a lot of people’s first experience with this kind of approach is JavaScript and its really poor implementation of it, which means people usually just write semicolons regardless to remove the possible mistakes.
Though the joke is that the largest ASI-related mistakes in JavaScript aren't solved by adding more semicolons, it's the places that the language adds semicolons you didn't expect that trip you up the worst. The single biggest mistake is adding a newline after the `return` keyword and before the return value accidentally making a `return undefined` rather than the return value.
In general JS is actually a lot closer to the Lua example than a lot of people want to believe. There's really only one ASI-related rule that needs to be remembered when dropping semicolons in JS (and it is a lot like that Lua rule of thumb), the Winky Frown rule: if a line starts with a frown it must wink. ;( ;[ ;`
(It has a silly name because it keeps it easy to remember.)
The recent go blog on error handling should make it clear that syntax is often not worth worrying about. https://go.dev/blog/error-syntax
(function() print("Test1") end)(); -- That semicolon is required
(function() print("Test2") end)()
Tangential, but I sidestepped this ambiguity in a language I've been designing on the side, via the simple rule that the function being called and the opening parenthesis can't have whitespace between them (e.g. "f()" is fine but "f ()" or "f\n()" is not). Ditto for indexing ("x[y]"). If these characters are encountered after whitespace, the parser considers it the beginning of a new expression.By sacrificing this (mostly unused, in practice) syntactic flexibility, I ended up not needing any sort of "semicolon insertion" logic - we just parse expressions greedily until they're "done" (i.e. until the upcoming token is not an operator).
The c form of `type name;` is ambiguous because it could actually be more than one thing depending on context. Even worse if you include macro sheananigans. The alternate (~rust/zig) is `var/const/mut name type` is unambiguous.
For humans, with rather long memory of what is going on in the codebase, this is ~"not a problem" for experts. But for an LLM, its knowledge is limited to the content that currently exists in your context, and conventions baked in with the training corpus, this matters. Of course it is ALSO a problem for humans if they are first looking at a codebase, and if the types are unusual.
Like which do you think is more token-efficient?
1)
<tool-call write_code "my_function(my_variable)"/>
2) <tool-call available_functions/>
resp:
<option> my_function </option>
<option> your_function </option>
<option> some_other_function </option>
<option> kernel_function1 </option>
<option> kernel_function2 </option>
<option> imported_function1 </option>
<option> imported_function2 </option>
<option> ... </option>
<tool-call write_function_call "my_function"/>
resp:
<option> my_variable </option>
<option> other_variable_of_same_type </option>
<tool-call write_variable "my_variable"/>LLMs are notoriously bad at counting.
The `type name` vs `let name: type` distinction matters more than it seems. When the grammar is unambiguous, the LLM can parse intent from a partial file without needing the full compilation context that a human expert carries in their head. Rust and Go are notably easier for LLMs to work with than C or C++ partly because the syntax encodes more structure.
The flip side: syntax that is too terse becomes opaque to LLMs for the same reason it becomes opaque to humans. Point-free Haskell, APL-family languages, heavy operator overloading — these rely on the reader holding a lot of context that does not exist in the immediate token window.
I wonder if we will see new languages designed with LLM-parseability as an explicit goal, the way some languages were designed for easy compilation.
I've worked on fine tuning projects. There's a massive bias towards fone tuning for Python at several model providers for example, followed by JS.
(I am aware of Combobulate[1] for Emacs folks, of which I'm sadly not one.)
There's an option beyond lisp. Forth has even less syntax.
My two biggest considerations when picking a language are:
- How well does it support value semantics? (Can I pass data around as data and know that it is owned by the declaring scope, or am I chained to references, potential nulls, and mutable handles with lifetimes I must consider? Can I write expression-oriented code?)
- How well does it support linear pipelining for immutable values? (If I want to take advantage of value semantics, there needs to be a way to express a series of computations on a piece of data in-order, with no strange exceptions because one procedure or another is a compiler-magic symbol that can't be mapped, reduced, filtered, etc. In other words, piping operators or Universal Function Call Syntax.)
I lean on value semantics, expression-oriented code, and pipelining to express lots of complex computations in a readable and maintainable manner, and if a language shoots me in the foot there, it's demoralizing.
There is a massive difference between Clojure, Prolog, and Forth.
The whole:
type name = value—type-focused
name: type = value—name-focused
var name type = value—qualifier-focused
Is so much deep into details of how syntax might look like.If you are choosing between Kotlin and Go, it is for the platform, not the syntax. If you decide between Haskell, Idris, Scheme, you do it with the syntax in mind.
C? Basically Algol. Pascal? Basically Algol, actually quite closely. Go? Basically Algol, via Pascal. Lua? Basically Algol, surprisingly closely.
Forth? Basically Lisp. Postscript? Basically Lisp.
I mean, once you decide the "flavor" (e.g.: typed, imperative, with a dash of functional and some oop for good measure), you could have more than one syntax and easily switch to whatever the reader wants.
We had an integration language in a product I worked on that had three flavors (you can check it here: https://docs.oracle.com/cd/E13154_01/bpm/docs65/pdf/OracleBP... , page 254)
The original syntax scared some people, so we had the compiler use the same AST with three different parsers: Original, Java and VB. The editor (which had syntax highlighting and auto completion) would let you see the code however you wanted.
You could even have a setting in the IDE that always showed the code as you wanted.
We even respected some weirdness in the spacing and indentation of comments and code when needed.
For some languages, like rust it may be a stretch, but for most vanilla languages, you could easily re-skin them to look much more like something else, that's comfy for whoever is looking at the code.
An interesting question, but the answer is "because it's a bad idea" that doesn't actually solve the problem.
That said, the right way to implement this is as a "transpiler" that compiles one syntax into another. And only the people who want to use it pay the costs.
There are many infamous examples of people using the C preprocessor to write near-Pascal or similar in C. It largely died out because it hindered effective communication about the code.
Throughout the article, OP seems baffled that people have aesthetic preferences. Well, yes, of course we do; dealing with ugly things is the computer's job.
It also comes across like OP hasn't seen a lot of examples of really interesting language syntax, i.e. things outside, shall we say, the extended Algol family. The discussion seems to accommodate brace-less languages like Python, but not e.g. the Lisp or Forth families.
> and thus just becomes a question of ergonomics or “optimizing for typing” (which is never the bottleneck).
It might not be a bottleneck in terms of time needed. But unpleasant syntax is annoying and breaks flow. Thoughts creep in about how you wish the language looked different and that you didn't have to type these other bits. (Which is why a lot of people feel so strongly about type inference.)
> From what I gather, this sentiment of not understanding why many “modern” languages still use semicolons is either:
OP seems to conflate "semicolon" with "visible, explicit token that separates statements. There's no reason it couldn't be some other punctuation, after all. Describing Python's approach to parsing as "automatic semicolon insertion" is wild to me; the indented-block structure is the point and statements are full lines by default. Semicolons in Python are a way to break that rule (along with parentheses, as noted), which are rarely seen as useful anyway (especially given the multiple assignment syntax).
> To allow for things like Allman braces, Odin allows for extra single newline in many places in its grammar, but only an extra single newline. This is to get around certain ambiguities between declaration a procedure type and a procedure literal
Right; and the point of Python's approach is to not need braces in the first place, and therefore sidestep any considerations of brace style. And when you do that, it turns out that you don't need to think nearly as hard about whether a newline should terminate a statement. It's a package deal.
> Maybe I don’t need to be as cynical and it is a lot simpler than all of that: first exposure bias. It’s the tendency for an individual to develop a preference simply because they became familiar with it first, rather that it be a rational choice from a plethora of options.
> However I do think there are rational reasons people do not like a syntax of a language and thus do not use it. Sometimes that syntax is just too incoherent or inconsistent with the semantics of the language. Sometimes it is just too dense and full or sigils, making it very hard to scan and find the patterns within the code.
For what it's worth, before I ever touched Python I had already used (in no particular order) multiple flavours of BASIC, Turing, probably at least two kinds of assembly, Scheme, C, C++, Java and Perl. To be fair, I had also used HyperTalk and Applescript, so maybe that does explain why I glommed onto Python. But BASIC came first.
In my mind, a mid-line semicolon is exactly the kind of sigil described here, and an end-of-line sigil is simply redundant. Multi-line statements should be the explicitly-marked exception, if only because long statements should be less common than shorter ones.
Most of the languages that are created anew, end up being a clone of C or C++. Go is one of the few exceptions here; Rust is not an exception. It is basically C++ really, from the syntax - or even worse.
Sadly it is not possible to try to convince people who claim that syntax does not matter, that it does matter. They just keep on repeating that syntax is irrelevant. I don't think syntax it is irrelevant at all. It has to do with efficiency of expression. Clear thoughts. Clear design. It is all inter-connected.
chrsw•1h ago