frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

The essential Reinhold Niebuhr: selected essays and addresses

https://archive.org/details/essentialreinhol0000nieb
1•baxtr•13s ago•0 comments

Rentahuman.ai Turns Humans into On-Demand Labor for AI Agents

https://www.forbes.com/sites/ronschmelzer/2026/02/05/when-ai-agents-start-hiring-humans-rentahuma...
1•tempodox•1m ago•0 comments

StovexGlobal – Compliance Gaps to Note

1•ReviewShield•4m ago•0 comments

Show HN: Afelyon – Turns Jira tickets into production-ready PRs (multi-repo)

https://afelyon.com/
1•AbduNebu•5m ago•0 comments

Trump says America should move on from Epstein – it may not be that easy

https://www.bbc.com/news/articles/cy4gj71z0m0o
2•tempodox•6m ago•0 comments

Tiny Clippy – A native Office Assistant built in Rust and egui

https://github.com/salva-imm/tiny-clippy
1•salvadorda656•10m ago•0 comments

LegalArgumentException: From Courtrooms to Clojure – Sen [video]

https://www.youtube.com/watch?v=cmMQbsOTX-o
1•adityaathalye•13m ago•0 comments

US moves to deport 5-year-old detained in Minnesota

https://www.reuters.com/legal/government/us-moves-deport-5-year-old-detained-minnesota-2026-02-06/
2•petethomas•16m ago•1 comments

If you lose your passport in Austria, head for McDonald's Golden Arches

https://www.cbsnews.com/news/us-embassy-mcdonalds-restaurants-austria-hotline-americans-consular-...
1•thunderbong•21m ago•0 comments

Show HN: Mermaid Formatter – CLI and library to auto-format Mermaid diagrams

https://github.com/chenyanchen/mermaid-formatter
1•astm•37m ago•0 comments

RFCs vs. READMEs: The Evolution of Protocols

https://h3manth.com/scribe/rfcs-vs-readmes/
2•init0•43m ago•1 comments

Kanchipuram Saris and Thinking Machines

https://altermag.com/articles/kanchipuram-saris-and-thinking-machines
1•trojanalert•43m ago•0 comments

Chinese chemical supplier causes global baby formula recall

https://www.reuters.com/business/healthcare-pharmaceuticals/nestle-widens-french-infant-formula-r...
1•fkdk•46m ago•0 comments

I've used AI to write 100% of my code for a year as an engineer

https://old.reddit.com/r/ClaudeCode/comments/1qxvobt/ive_used_ai_to_write_100_of_my_code_for_1_ye...
1•ukuina•48m ago•1 comments

Looking for 4 Autistic Co-Founders for AI Startup (Equity-Based)

1•au-ai-aisl•59m ago•1 comments

AI-native capabilities, a new API Catalog, and updated plans and pricing

https://blog.postman.com/new-capabilities-march-2026/
1•thunderbong•59m ago•0 comments

What changed in tech from 2010 to 2020?

https://www.tedsanders.com/what-changed-in-tech-from-2010-to-2020/
2•endorphine•1h ago•0 comments

From Human Ergonomics to Agent Ergonomics

https://wesmckinney.com/blog/agent-ergonomics/
1•Anon84•1h ago•0 comments

Advanced Inertial Reference Sphere

https://en.wikipedia.org/wiki/Advanced_Inertial_Reference_Sphere
1•cyanf•1h ago•0 comments

Toyota Developing a Console-Grade, Open-Source Game Engine with Flutter and Dart

https://www.phoronix.com/news/Fluorite-Toyota-Game-Engine
1•computer23•1h ago•0 comments

Typing for Love or Money: The Hidden Labor Behind Modern Literary Masterpieces

https://publicdomainreview.org/essay/typing-for-love-or-money/
1•prismatic•1h ago•0 comments

Show HN: A longitudinal health record built from fragmented medical data

https://myaether.live
1•takmak007•1h ago•0 comments

CoreWeave's $30B Bet on GPU Market Infrastructure

https://davefriedman.substack.com/p/coreweaves-30-billion-bet-on-gpu
1•gmays•1h ago•0 comments

Creating and Hosting a Static Website on Cloudflare for Free

https://benjaminsmallwood.com/blog/creating-and-hosting-a-static-website-on-cloudflare-for-free/
1•bensmallwood•1h ago•1 comments

"The Stanford scam proves America is becoming a nation of grifters"

https://www.thetimes.com/us/news-today/article/students-stanford-grifters-ivy-league-w2g5z768z
4•cwwc•1h ago•0 comments

Elon Musk on Space GPUs, AI, Optimus, and His Manufacturing Method

https://cheekypint.substack.com/p/elon-musk-on-space-gpus-ai-optimus
2•simonebrunozzi•1h ago•0 comments

X (Twitter) is back with a new X API Pay-Per-Use model

https://developer.x.com/
3•eeko_systems•1h ago•0 comments

Zlob.h 100% POSIX and glibc compatible globbing lib that is faste and better

https://github.com/dmtrKovalenko/zlob
3•neogoose•1h ago•1 comments

Show HN: Deterministic signal triangulation using a fixed .72% variance constant

https://github.com/mabrucker85-prog/Project_Lance_Core
2•mav5431•1h ago•1 comments

Scientists Discover Levitating Time Crystals You Can Hold, Defy Newton’s 3rd Law

https://phys.org/news/2026-02-scientists-levitating-crystals.html
3•sizzle•1h ago•0 comments
Open in hackernews

Diffsitter – A Tree-sitter based AST difftool to get meaningful semantic diffs

https://github.com/afnanenayet/diffsitter
151•mihau•7mo ago

Comments

fjfaase•7mo ago
Discussed before on https://news.ycombinator.com/item?id=27875333
koozz•7mo ago
I thought I’ve seen it before. I use Difftastic myself, amazing diffs. https://github.com/Wilfred/difftastic
alwillis•7mo ago
Same.
jbellis•7mo ago
If you're looking for something more complete and actively maintained, check out https://github.com/GumTreeDiff/gumtree.

(I evaluated semantic diff tools for use in Brokk but I ultimately went with standard textual diff; the main hangup that I couldn't get past is that semantic diff understandably works very poorly when you have a syntactically invalid file due to an in-progress edit.)

pests•7mo ago
I watched a video long ago about how the Roslyn C# compiler handled this but I forget the details.
pfdietz•7mo ago
The interesting problem here would be how do you produce a robust parse tree for invalid inputs, in the sense of stably parsing large sections of the text in ways that don't change too much. The tree would have to be an extension of an actual parse tree, with nodes indicating sections that couldn't be fully parsed or had errors. The diff algorithm would have to also be robust in the face of such error nodes.

For the parsing problem, maybe something like Early's algorithm that tries to minimize an error term?

You need this kind of robust parser for languages with preprocessors.

o11c•7mo ago
Unfortunately, this depends on making good decisions during language design; it's not something you can retrofit with a new lexer and parser.

One very important rule is: no token can span more than one (possibly backslash-extended) line. This means having neither delimited comments (use multiple single-line comments; if your editor is too dumb for this you really need a new editor) nor multi-line strings (but you can do implicit concatenation of a string literal flavor that implicitly includes the newline; as a side-effect this fixes the indentation problem).

If you don't follow this rule, you might as well give up on robustness, because how else are you going to ever resynchronize after an error?

For parsing you can generally just aggressively pop on mismatched parens, unexpected semicolons, or on keywords only allowed in a top-ish level context. Of course, if your language is insane (like C typedefs), you might not be able to parse the next top-level function/class anyway. GNU statement-expressions, by contrast, are an actually useful thing that requires some thought. But again, language design choices can mitigate this (such as making classes values, template argument equivalent to array indexing, and statements expressions).

pfdietz•7mo ago
> how else are you going to ever resynchronize after an error?

An error-cost-minimizing dynamic programming parser could do this.

o11c•7mo ago
That fundamentally misunderstands the problem in multiple ways:

* this is still during lexing, not yet to parsing

* there are multiple valid token sequences that vary only with a single character at the start of the file. This is very common with Python multi-line strings in particular, since they are widely used as docstrings.

pfdietz•7mo ago
One could fold lexing into the parsing and do error cost minimization on both.
conartist6•7mo ago
Error recovery is a dead-end tech for all the reasons you say.

If people want to move forward they'll look past it. Garbage in, garbage out.

WorldMaker•7mo ago
I think the easiest trick here is to stop thinking about it as a parsing problem and consider it only as a lexing problem. A good lexer either doesn't throw out errors or minimizes error token states, and a good lexer gets back to a regular stream of tokens as quickly as it can. This is why we trust "simple" lexers as our syntax highlighters in most IDEs, they are fast, and they handle unfinished and malformed documents just fine (we write those all the time in our processes in our editors).

My experience many years back with using just a syntax highlighting tuned lexer to build character-level diffs showed a lot of great promise: https://github.com/WorldMaker/tokdiff

Timwi•6mo ago
Just use an AST if it parses, and fall back to plain text diffs if it doesn't.
ilyagr•7mo ago
In case anybody happens to be interested in testing `gumtree` with https://github.com/jj-vcs/jj, I think I got them to work together. See https://github.com/GumTreeDiff/gumtree/wiki/VCS-Integration#... (assumes Docker).
affyboi•7mo ago
Note that diffsitter isn’t abandoned or anything. I took a year off working and just started a new job so I’ve been busy. I’ve got a laundry list of stuff I want to do with this project that will get done (at some point)
aiiizzz•7mo ago
This is cool, I really think this kind of thing integrated with LLMs for code editing will be wonderful. Days of manually typing code are coming to an end. I was looking for something better than meld, and this might be it.
the__alchemist•7mo ago
Is there an anti-tree-sitter version too?
davepeck•7mo ago
yes, although it's sort of the same as Context-Free-Typing-sitter
esafak•7mo ago
Some make a semantic diff splitter please! Break up big commits into small, atomic, meaningful ones.
0x457•7mo ago
Well, that's what git-patch is: https://patch-diff.githubusercontent.com/raw/denoland/deno/p...
esafak•7mo ago
I can't make sense of that link. How many parts was the diff split up into, and along what lines?
0x457•7mo ago
Yeah, I don't know why I linked that as an example. Wanted to show structure of a patch. Each commit of a patch already has everything ready to be processed and chunked IF you keep them - small, atomic, semantically meaningful. As in do smaller commits.
mdaniel•7mo ago
> > Some make a semantic diff splitter please! Break up big commits into small, atomic, meaningful ones.

> Each commit of a patch already has everything ready to be processed and chunked IF you keep them - small, atomic, semantically meaningful. As in do smaller commits.

Reads like:

User1: I need help with my colleagues who do not make independent, small, semantically intact commits

User2: well, have you tried making smaller, more independent, semantically intact commits?

---

My interpretation of the wish is to convert this, where they have intermixed two semantically independent changes in one diff:

    +++ a/alpha.py
    --- b/alpha.py

     def doit():
    -    awesome = 3.14
    +    awesome = 4.56

    -    print("my dog is fluffy")
    +    print("my cat is fluffy")
into this

    +++ a/alpha.py
    --- b/alpha.py

     def doit():
    -    awesome = 3.14
    +    awesome = 4.56

         print("my dog is fluffy")

    +++ a/alpha.py
    --- b/alpha.py

     def doit():
         awesome = 3.14

    -    print("my dog is fluffy")
    +    print("my cat is fluffy")
where each one could be cherry-picked at will because they don't semantically collide

The semantics part would be knowing that this one could not be split in that manner, because the cherry-pick would change more than just a few lines, it would change the behavior

    +++ a/alpha.py
    --- b/alpha.py

     def doit():
    -    the_weight = 3.14
    +    the_weight = 4.56

    -    print("my dog weighs %f", the_weight)
    +    print("my cat weighs %f", the_weight)
I'm sure these are very contrived examples, but it's the smallest one I could whip up offhand
ethan_smith•7mo ago
Check out git-imerge or git-absorb which can help with this problem by intelligently splitting or absorbing changes into the right commits.
alwillis•7mo ago
First time I used absorb was in Mercurial back in the day: https://gregoryszorc.com/blog/2018/11/05/absorbing-commit-ch...
pmkary•7mo ago
What a genius idea.
affyboi•7mo ago
Nah I think most people could make something like this in a weekend
pmkary•6mo ago
As I said: genius "idea" not
vrm•7mo ago
This is neat! I think in general there are really deep connections between semantically meaningful diffs (across modalities) and supervision of AI models. You might imagine a human-in-the-loop workflow where the human makes edits to a particular generation and then those edits are used as supervision for a future implementation of that thing. We did some related work here: https://www.tensorzero.com/blog/automatically-evaluating-ai-... on the coding use case but I'm interested in all the different approaches to the problem and especially on less structured domains.
dcre•7mo ago
See also https://mergiraf.org/ for a tool that uses ASTs to resolve (some) merge conflicts.
Iwan-Zotow•7mo ago
integration to VSCODE?
1-more•7mo ago
See also difftastic https://difftastic.wilfred.me.uk/languages_supported.html
ilyagr•7mo ago
https://github.com/Wilfred/difftastic/wiki/Structural-Diffs is a nice list of alternatives.

Difftastic itself is great as well! The author wrote up nice posts about its design: https://www.wilfred.me.uk/blog/2022/09/06/difftastic-the-fan..., https://difftastic.wilfred.me.uk/diffing.html.

mertleee•7mo ago
This is really cool.

Although - for more exotic applications parsing structural data I've found langium is far more capable as a platform. Typescript is also a pleasant departure from common AST tools.

john_max_1•7mo ago
How does it compare to diffmerge?
jacobr•7mo ago
Could the next-gen version control system just store ASTs? Does this already exist?

Every user gets their own preferred formatting, and linters and tools could operate on already-parsed trees

haradion•7mo ago
The Unison programming language is built around that idea: https://www.unison-lang.org/docs/the-big-idea/
conartist6•7mo ago
BABLR is building that! It's entirely fair to say that BABLR takes Unison's approach and allows it to be used with every programming language.
williamdclt•7mo ago
This is an idea that comes back often, and has merit of course.

The thing is that this means sacrificing the enormous advantage of plaintext, which is that it is enormously interoperable: we use a huge quantity of text-based tools to work with source code, including non-code-specific ones (grep, sed…)

Also, code is meant to be read by humans: things like alignement and line returns really do matter (although opinions often differ about the “right” way)

And of course the lesser (?) problem of invalid ASTs.

conartist6•7mo ago
These are all solvable problems, and I know because I have built a solution the demonstrates how they can all be solved at the same time.
WorldMaker•7mo ago
I don't think invalid ASTs are a "lesser" problem, it is a pretty big one: we want to be able to source control work in progress and partially complete things. There's a lot of reasons you might not want to or be able to finish a bit of code and yet you still want to record what you've done and where you are (to pick it back up later, to get other developers' eyes on a sketch or an outline, to save it to backup systems, etc). Those are often important steps in development, though it is easy to forget about how common they are when you think about software as finished/buildable artifacts only.

I know a lot of people think source control should only have buildable code, but that's what CI processes are for and people use source control (and diffs) for a lot of things that don't need to pass CI 100% of the time.

Timwi•6mo ago
I don't understand why it can't just use an AST if it parses, and fall back to plain text diffs if it doesn't.
WorldMaker•6mo ago
Churn in the diffs is a big reason, if the point of wanting a semantic diff is to have a smarter diff for smarter patches/merges. The smartness of your merge is generally a lowest common denominator operation. If most of your intermediate diffs are dumb plain text diffs, your final merge operation is to some extent mostly going to still be a dumb plain text merge.

That may be fine if you are happy with the plain text status quo, but if your goal is to avoid or minimize merge conflicts (as most people want when talking about semantic diff), you don't really solve that as well as you'd like.

(Additionally, and it is a lot less of a concern for git on disk storage but for some git-based email flows and other VCSes patch size matters and a consistent style of diffs between patches can be a useful storage or transfer optimization. Plain text diffs are more likely to produce a lot bigger patches compared to optimization wins you might get from a semantic diff; a mixture of merges between semantic and plain text diffs is often a worst of both worlds case in overall patch sizes as they churn against each other.)

tempfile•7mo ago
Isn't this one of the basic ideas of Lisp?
foo42•7mo ago
you might want to check out eyg lang (eat your greens) as I think the idea is explicitly that syntax is user preferences and the ast is the _real_ language
modderation•7mo ago
This looks interesting! I've been building a similar tool that uses TreeSitter to follow changes to AST contents across git commits, with the addition of tying the node state to items in another codebase. In short, if something changes upstream, the corresponding downstream functionality can be flagged for review.

The ultimate goal is to simplify the building and maintenance of a port of an actively-maintained codebase or specification by avoiding the need to know how every last upstream change corresponds to the downstream.

Just from an initial peek at the repo, I might have to take a look at how the author is processing their TreeSitter grammars -- writing the queries by hand is a bit of a slow process. I'm sure there are other good ideas in there too, and Diffsitter looks like it'd be perfect for displaying the actual semantic changes.

Early prototype, heavily relies on manual annotations in the downstream: https://github.com/NTmatter/rawr

(yes, it's admittedly a "Rewrite it in Rust" tool at the moment, but I'd like it to be a generic "Rewrite it in $LANG" in the future)