Hi HN — I built semantic-diff for myself, and it turned out surprisingly useful.
Regular git diff shows what changed. semantic-diff tries to answer:
– why was this change made?
– what could break?
– what should the reviewer focus on?
It gives you a ranking from critical to low, with review questions prioritized.
The funny part: I ran it on its own commits during development. The tool roasted me harder than any reviewer I've had )))
Runs locally, hooks into pre-push, also works as a GitHub Action.
Would love feedback, especially from people doing code review at scale.
Comments
forgotpwd16•1w ago
That's commit review not semantic diff, and "why" some changes happened should be in the commit body (some will argue should be in code comments too) rather try figure it out after the fact, which won't even work. E.g.
The Intent in this report just paraphrases the well-written commit message. But what about things not written? Funnily the tool catches up the drifts:
2. Why was 196 lines of CLAUDE_NOTES.md removed? Was this intentional cleanup or accidental deletion of important project context?
The diff shows massive content reduction in project documentation with no explanation in the commit message. This could represent loss of architectural decisions, bug tracking, or development history.
So... why? No answer is given because not possible.
6. Why skip dependabot PRs specifically? Shouldn't dependabot updates also be semantically analyzed to catch breaking changes in dependencies?
Dependabot PRs could introduce security vulnerabilities or breaking API changes that semantic analysis would catch. Skipping them entirely might miss important issues.
Good question. But, again, no answer.
Also maybe have an option on analysis extend, sections to see, and perhaps prompt to lessen the redundant information. E.g. in previous report, the "CLAUDE_NOTES.md massive content reduction" appears in Impact Map, Risk Assessment, Review Questions. Plus the entire sentence "Massive reduction in file size (212 lines removed, 16 added) suggests project documentation/status was significantly refactored or cleaned up" simply states what is obvious from the deleted lines in Files Changed. Some may like this, other (incl. me) won't. Personally will've preferred overall report be shorter.
The indirect impacts section seems interesting. There're cases that one may want to see modules impacted by a change, e.g. a function changing the result, making another function fail. But this is probably doable heuristically rather requiring an LLM to read the codebase.
mvyshnyvetska•1w ago
Fair points :). The naming is 'evolutionary' — started as 'semantic diff' because it analyzes meaning not just lines, but 'commit review' is more accurate for what it does now.
You're right about the redundancy — same issue appearing in 3 sections is noise. Adding output config (sections to include, verbosity level) is on the list.
The 'why was this deleted' questions — yeah, the tool can't answer, but surfacing the question for the reviewer has value. At least you know to ask the author.
Good callout on dependabot. Worth reconsidering.
Thanks for actually trying it and giving specific feedback.
mvyshnyvetska•1w ago
Update: just shipped --brief flag. Thanks for the push!
forgotpwd16•1w ago
>the tool can't answer, but surfacing the question for the reviewer has value
That's true. Could even expand on this and make it have a --check-msg mode that returns 0/1 (good/bad) if commit messages lack the reasoning for the changes they're doing. Although, linting commit messages is last thing I want to do.
mvyshnyvetska•1w ago
Yeah, that's more process-tooling territory — keeping this dev-focused for now.
SamiBuilds•1w ago
Hi! Your semantic diff idea is really cool. I actually developed a tool called [API GEN] which helps analyze the impact of code changes detect potential security issues, and prioritize review tasks automatically. It works locally integrates with CI/CD and can provide insights alongside semantic-diff to make large-scale code review more efficient.
I’d love to hear your thoughts on combining semantic analysis with automated impact and risk detection it could be a powerful combo for reviewers.
forgotpwd16•1w ago
https://github.com/tkenaz/semantic_diff/blob/main/semantic_d...
The Intent in this report just paraphrases the well-written commit message. But what about things not written? Funnily the tool catches up the drifts:
So... why? No answer is given because not possible. Good question. But, again, no answer.Also maybe have an option on analysis extend, sections to see, and perhaps prompt to lessen the redundant information. E.g. in previous report, the "CLAUDE_NOTES.md massive content reduction" appears in Impact Map, Risk Assessment, Review Questions. Plus the entire sentence "Massive reduction in file size (212 lines removed, 16 added) suggests project documentation/status was significantly refactored or cleaned up" simply states what is obvious from the deleted lines in Files Changed. Some may like this, other (incl. me) won't. Personally will've preferred overall report be shorter.
The indirect impacts section seems interesting. There're cases that one may want to see modules impacted by a change, e.g. a function changing the result, making another function fail. But this is probably doable heuristically rather requiring an LLM to read the codebase.
mvyshnyvetska•1w ago
mvyshnyvetska•1w ago
forgotpwd16•1w ago
That's true. Could even expand on this and make it have a --check-msg mode that returns 0/1 (good/bad) if commit messages lack the reasoning for the changes they're doing. Although, linting commit messages is last thing I want to do.
mvyshnyvetska•1w ago