The problem is diffs are the wrong format. A PR might change how three buttons behave. Staring at green and red lines to understand that is crazy.
The core reason we built this is that we feel that products today are built with assumptions from the past. 100x code with the same review systems means 100x human attention. Human attention cannot scale to fit that need, so we built something different. Humans are provably more engaged with video content than text.
So we RL trained and built an agent that watches your preview deployment when you open a PR, clicks around the stuff that changed, and posts a video in the PR itself.
Hardest part was figuring out where changed code actually lives in the running app. A diff could say Button.tsx line 47 changed, but that doesn't tell you how to find that button. We walk React's Fiber tree where each node maps back to source files, so we can trace changes to bounding boxes for the DOM elements. We then reward the model for showing and interacting within it.
This obviously only works with React so we have to get more clever when generalizing to all languages.
We trained an RL agent to interact with those components. Simple reward: points for getting modified stuff into viewport, double for clicking/typing. About 30% of what it does is weird, partial form submits, hitting escape mid-modal, because real users do that stuff and polite AI models won't test it on their own.
This catches things unit tests miss completely: z-index bugs where something renders but you can't click it, scroll containers that trap you, handlers that fail silently.
What's janky right now: feature flags, storing different user states, and anything that requires context not provided.
Free to try: https://morphllm.com/dashboard/integrations/github
StrangeSound•1h ago