Lately I’ve been feeling frustrated during reviews when an AI generates a large number of changes. Even if the diff is "small", it can be very hard to understand what actually changed in behavior or structure.
I started experimenting with a different approach: comparing two snapshots of the code (baseline and current) instead of raw line diffs. Each snapshot captures a rough API shape and a behavior signal derived from the AST. The goal isn’t deep semantic analysis, but something fast that can signal whether anything meaningful actually changed.
It’s intentionally shallow and non-judgmental — just signals, not verdicts.
At the same time, I see more and more LLM-based tools helping with PR reviews. Probabilistic changes reviewed by probabilistic tools feels a bit dangerous to me.
Curious how others here think about this: – Do diffs still work well for AI-generated changes? – How do you review large AI-assisted refactors today?
nuky•1h ago
I ran into this problem while reviewing AI-gen refactors and started thinking about whether we’re still reviewing the right things. Mostly curious how others approach this.