(I evaluated semantic diff tools for use in Brokk but I ultimately went with standard textual diff; the main hangup that I couldn't get past is that semantic diff understandably works very poorly when you have a syntactically invalid file due to an in-progress edit.)
For the parsing problem, maybe something like Early's algorithm that tries to minimize an error term?
You need this kind of robust parser for languages with preprocessors.
One very important rule is: no token can span more than one (possibly backslash-extended) line. This means having neither delimited comments (use multiple single-line comments; if your editor is too dumb for this you really need a new editor) nor multi-line strings (but you can do implicit concatenation of a string literal flavor that implicitly includes the newline; as a side-effect this fixes the indentation problem).
If you don't follow this rule, you might as well give up on robustness, because how else are you going to ever resynchronize after an error?
For parsing you can generally just aggressively pop on mismatched parens, unexpected semicolons, or on keywords only allowed in a top-ish level context. Of course, if your language is insane (like C typedefs), you might not be able to parse the next top-level function/class anyway. GNU statement-expressions, by contrast, are an actually useful thing that requires some thought. But again, language design choices can mitigate this (such as making classes values, template argument equivalent to array indexing, and statements expressions).
An error-cost-minimizing dynamic programming parser could do this.
* this is still during lexing, not yet to parsing
* there are multiple valid token sequences that vary only with a single character at the start of the file. This is very common with Python multi-line strings in particular, since they are widely used as docstrings.
Difftastic itself is great as well! The author wrote up nice posts about its design: https://www.wilfred.me.uk/blog/2022/09/06/difftastic-the-fan..., https://difftastic.wilfred.me.uk/diffing.html.
fjfaase•13h ago
koozz•9h ago
alwillis•2h ago