Can you describe what ways this might be beyond just breaking up code into smaller functions?
An example of this is that Models tend to create unit tests that are mostly just mock + reimplementations of imperative code in the functions they test. If you could force behavioral testing by only allowing test creation agents to accessing the function docstring, name/args/types, branch statements and log events, you could potentially avoid these classes of weak tests being created. But that would mean that your code has to optimize to providing signal via those elements.
This is just an example I'm not sure that would actually work.
I don't know if you can reliably do that with static analysis tho. I would be interested in some sort of debug attachment like process that does a code coverage type evaluation. If you can't tell this is at least on the edge of (if not past) my depth of expertise
We're on the structural side right now with call graphs and dependency edges, but a hybrid approach that combines the static graph with runtime instrumentation to fill in the gaps is definitely something I'd love to explore. Thanks for the feedback.
Things with LLMs break because our infra was always designed for analyzing lines(tools like grep fuzzy matching) and working on quite small sections of code. LLMs struggle with this in cases when they have to analyze different parts of a codebase they either get too much context where you're throwing whole files at them, or too little where they only see the function in isolation, with no real understanding of how the pieces actually connect to each other.
That's really the gap sem is trying to fill. With sem impact you can give an agent the precise blast radius of a change instead of guessing which files matter, and sem diff --patch lets you enforce that a change only touches specific functions and reject anything that bleeds outside that boundary something that's really hard to do with line-level diffs.
Your testing idea is actually closer than you might think. sem already extracts entity signatures, dependencies, and call graphs, so you could build a harness that gives the test-writing agent only the function signature with its dependency graph and behavioral contract, while withholding the implementation entirely. That would force the agent toward behavioral tests because it literally can't see the internals to mock them. I haven't built this harness myself yet but sem graph and sem inspect expose everything you'd need.
The general principle is that sem gives you a structural map of the codebase to both constrain and validate what the model produces, rather than treating code as flat text and hoping the model figures out the relationships on its own.
Another usecase can be about figuring out dead code present in the codebase.
Edit: Also one last thing because I started working on this while solving the fundamental issue of why merge conflicts were occuring with git, so you might also like the merge drive I open sourced on the same Github org - Weave
I think there’s an opportunity to use an AST diff system for code forges where you don’t present the user with line diffs in the UI — or at least not as the first diff the user sees.
I firmly believe code review should happen in your editor.
I can also give my thought process, because I was more interested in figuring out the model's inherent search results and understanding without sem.
$ sem impact authenticateUser
⊕ function authenticateUser (src/auth/login.ts:26)
→ depends on: db.findUser, rateLimiter.check
← used by: loginRoute, authMiddleware
! 42 entities transitively affected
ᛋ 7 tests affected
Okay that is pretty cool. I appreciate this information as a human also.I got about halfway through reinventing something like this last year (minus the git part). I was trying to make a graph of dependencies in the codebase. (I actually got pretty far with a regex!)
Animats•1h ago
rohanucla•1h ago
So instead of line level analysis the whole granularity of seeing changes and tracking thing shifts to entities. It helps in attention mapping of your agent and lets you track the changes faster.
LSPs have been doing it for quite long but using treesitters is faster even tho type awareness is not great with this approach but overall working across multiple languages with a single tool can be quite helpful.