Show HN: I made a heatmap diff viewer for code reviews

265•lawrencechen•3mo ago

0github.com is a pull request viewer that color-codes every diff line/token by how much human attention it probably needs. Unlike PR-review bots, we try to flag not just by "is it a bug?" but by "is it worth a second look?" (examples: hard-coded secret, weird crypto mode, gnarly logic, ugly code).

To try it, replace github.com with 0github.com in any pull-request URL. Under the hood, we split the PR into individual files, and for each file, we ask an LLM to annotate each line with a data structure that we parse into a colored heatmap.

Examples:

https://0github.com/manaflow-ai/cmux/pull/666

https://0github.com/stack-auth/stack-auth/pull/988

https://0github.com/tinygrad/tinygrad/pull/12995

https://0github.com/simonw/datasette/pull/2548

Notice how all the example links have a 0 prepended before github.com. This navigates you to our custom diff viewer where we handle the same URL path parameters as github.com. Darker yellows indicate that an area might require more investigation. Hover on the highlights to see the LLM's explanation. There's also a slider on the top left to adjust the "should review" threshold.

Repo (MIT license): https://github.com/manaflow-ai/cmux

Comments

jtwaleson•3mo ago

This is really useful. Might want to add a checkbox at a certain threshold, so that reviewers explicitly answer the concerns of the LLM. Also you can start collecting stats on how "easy to review" PR's of team members are, e.g. they'd probably get a better score if they address the concerns in the comments already.

timenotwasted•3mo ago

This is very cool and I could see it being really useful especially for those giant PRs. I'd prefer it if instead of the slider I could just click the different heatmap colors and if they indicated what exactly they were for (label not threshold). I get the underlying premise but at a glance it's more to process unless I was to end up using this constantly.

lawrencechen•3mo ago

Currently tooltips are shown when hovering on highlighted words. Need to make it visible on mobile though. Was wondering if you were thinking of another way to show the labels besides hovering?

timenotwasted•3mo ago

I was referring to something more akin to a legend like you have in the examples "(examples: hard-coded secret, weird crypto mode, gnarly logic)." where I could click "hard-coded secret" (not the best label but you get the idea) and it would filter on those instead of the slider.

383toast•3mo ago

Reminds me of this one, highlighting for text https://github.com/mattneary/salience

cdiamand•3mo ago

This is something I have found missing in my current workflow when reviewing PR's. Particularly in the age of large AI generated PR's.

I think most reviewers do this to some degree by looking at points of interest. It'd be cool if this could look at your prior reviews and try to learn your style.

Is this the correct commit to look at? https://github.com/manaflow-ai/cmux/commit/661ea617d7b1fd392...

lawrencechen•3mo ago

https://github.com/manaflow-ai/cmux/blob/main/apps/www/lib/s...

This file has most of the logic, the commit you linked to has a bunch of other experiments.

> look at your prior reviews and try to learn your style.

We're really interested in this direction too of maybe setting up a DSPy system to automatically fit reviews to your preferences

cdiamand•3mo ago

Thank you. This is a pretty cool feature that is just scratching the surface of a deep need, so keep at it.

Another perspective where this exact feature would be useful is in security review.

For example - there are many static security analyzers that look for patterns, and they're useful when you break a clearly predefined rule that is well known.

However, there are situations that static tools miss, but a highlight tool like this could help bring a reviewer's eyes to a high risk "area". I.e. scrutinize this code more because it deals with user input information and there is the chance of SQL injection here, etc.

I think that would be very useful as well.

austinwang115•3mo ago

This is a very interesting idea that we’ll definitely look into.

austinwang115•3mo ago

This makes reading long PRs not instantly LGTM… now the heatmap guides my eyes so I know where to look.

nzach•3mo ago

I think this "'should review' threshold" is a really great idea, but I probably wouldn't be able to trust it enough to make it useful.

wiether•3mo ago

I like the idea!

File `apps/client/electron/main/proxy-routing.ts` line 63

Adding a comment to explain why the downgrade is done would have resulted in not raising the issue?

Also two suggestions on the UI

- anchors on lines

- anchors on files and ability to copy a filename easily

lawrencechen•3mo ago

Good suggestions! Will make it more URL friendly.

> Adding a comment to explain why the downgrade is done would have resulted in not raising the issue?

Trying it out here with a new PR on same branch: https://0github.com/manaflow-ai/cmux/pull/809

Will check back on it later!

EDIT: seems like my comment online 62 got highlighted. Maybe we should surface the ability edit the prompt.

wiether•3mo ago

Thanks for the test!

Thinking about it with the feedback, I'm not sure of what I would have liked to see actually.

First I was expecting no highlight once you added a comment explaining why.

But then, seeing the highlight, I'm thinking that a comment shouldn't a magical tool to allow doing crazy stuff.

I don't know anything about the Electron wrapper, so maybe it is actually possible to do HTTPS and someone could point out how to achieve this. And having the downgrade highlighted can help having this someone finding out.

I'll keep thinking about it! Thanks!

skeptrune•3mo ago

I feel like this is really smart. Going to have to set it up!

austinwang115•3mo ago

Just prepend 0 in front of github in your PR link and it should work

skeptrune•3mo ago

Ah, I see now.

n2d4•3mo ago

> https://0github.com/stack-auth/stack-auth/pull/988

Very fun to see my own PR on Hacker News!

This looks great. I'm probably gonna keep the threshold set to 0%, so a bit more gradient variety could be nice. Red-yellow-green maybe?

Also, can I use this on AI-generated code before creating a PR somehow? I find myself spending a lot of time reviewing Codex and Claude Code edits in my IDE.

lawrencechen•3mo ago

Yeah we definitely want to make the gradient and colors configurable.

What form factor would make the most sense for you? Maybe a a cli command that renders the diff in cli or html?

n2d4•3mo ago

Either would work, I think. How I do it right now is that I let AI edit automatically, but then check the diff in Cursor before I stage my Git changes. May be different for others.

lawrencechen•3mo ago

Yeah, heatmapping the diff before creating a PR would need tighter IDE integration. We're working on cmux for this purpose. It's kinda an IDE, and it lives in the same repo: https://github.com/manaflow-ai/cmux.

After we add the heatmap diff viewer into cmux, I expect that I'll be spending most of my time in between the heatmap diff and a browser preview: https://github.com/manaflow-ai/cmux/raw/main/docs/assets/cmu...

froh•3mo ago

colorbrewer has proven high contrast gradients and also color blind options.

a cli command with two options, console (color) and HTML opens all doors, right?

blks•3mo ago

Perhaps more time that you would spend writing code yourself.

petralithic•3mo ago

Change the domain name, you will likely get a cease and desist otherwise.

ramonga•3mo ago

Maybe add some caching? I clicked one of the example PRs and it kept loading forever...

lawrencechen•3mo ago

Shoot, we should have caching in place already. Taking a look now

lawrencechen•3mo ago

Getting rate limited by GitHub, gonna add caching here as well. Temporary workaround is to sign in manually and return to example page: https://0github.com/handler/sign-in

austinwang115•3mo ago

pushed a fix, should work now

kburman•3mo ago

It’s an interesting direction, but feels pretty expensive for what might still be a guess at what matters.

I’m not sure an LLM can really capture project-specific context yet from a single PR diff.

Honestly, a simple data-driven heatmap showing which parts of the code change most often or correlate with past bugs would probably give reviewers more trustworthy signals.

lawrencechen•3mo ago

Yeah this is honestly pretty expensive to run today.

> I’m not sure an LLM can really capture project-specific context yet from a single PR diff.

We had an even more expensive approach that cloned the repo into a VM and prompted codex to explore the codebase and run code before returning the heatmap data structure. Decided against it for now due to latency and cost, but I think we'll revisit it to help the LLM get project context.

Distillation should help a bit with cost, but I haven't experimented enough to have a definitive answer. Excited to play around with it though!

> which parts of the code change most often or correlate with past bugs

I can think of a way to do the correlation that would require LLMs. Maybe I'm missing a simpler approach? But agree that conditioning on past bugs would be great

kburman•3mo ago

For the correlation idea, you might take a look at how Sentry does it, they rely mostly on stack traces, error messages, and pattern matching to map issues back to code areas. It’s cheap, scalable, and doesn’t need an LLM in the loop, which could be a good baseline before layering anything heavier on top.

As for interactive reviews, one workflow I’ve found surprisingly useful is letting Claude Code simulate a conversation between two developers pair-programming through the PR. It’s not perfect, but in practice the dialogue and clarifying questions it generates often give me more insight than a single shot LLM summary. You might find it an interesting pattern to experiment with once you revisit the more context-aware approaches.

CuriouslyC•3mo ago

Gemini is better than GPT5 variants for large context. Also, agents tend to be bad at gathering an optimal context set. The best approach is to intelligently select from the codebase to generate a "covering set" of everything touched in the PR, make a bundle, and fire it off at Gemini as a one shot. Because of caching, you can even fire off multiple queries to Gemini instructing it to evaluate the PR from different perspectives for cheap.

lawrencechen•3mo ago

Yeah, adding a context gathering step is a good idea. Our original approach used codex cli in a VM, so context gathering was pretty comprehensive. We switched to a more naive approach due to latency, but having a step using a smaller model (like SWE-grep) could be a nice tradeoff.

nonethewiser•3mo ago

A large portion of the lines of code I'm considering when I review a PR are not part of the diff. This has to be a common experience - think of how often you want to comment on a line of code or file that just isn't in the PR. It happens almost every PR for me. They materialize as lose comments, or comments on a line like "Not this line per-se but what about XYZ?" Or "you replaced this 3 places but I actually found 2 more it should be applied to."

I mean these tools are fine. But let's be on the same page that they can only address a sub-class of problems.

CuriouslyC•3mo ago

This is not that expensive with Gemini, they give free keys that have plenty of req/day, you can upload your diff + a bundle of the relevant part of the codebase and get this behavior for free, at least for a small team with ~10-20 PR's / day. If you could run this with personal keys, anyhow.

fluoridation•3mo ago

Might just be me, but I understood "expensive" in terms of raw computation necessary to get the answer. Some things aren't really worth computing, even if it's someone else footing the bill.

ivanjermakov•3mo ago

Premise is amazing. Wonder if there are tools that do something similar by looking at diff entropy.

cerved•3mo ago

> Honestly, a simple data-driven heatmap showing which parts of the code change most often or correlate with past bugs would probably give reviewers more trustworthy signals.

At first I thought this to but now I doubt that's a good heuristic. That's probably where people would be careful and/or look anyway. If I were to guess, regressions are less likely to occur in "hotspots".

But this is just a hunch. There are tons of well reviewed and bug reported open source projects, would be interesting if someone tested it.

mmastrac•3mo ago

I tried it on a low-complexity Rust PR I worked on a few months back and it did a pretty good job. I'd probably change where the highlights live (for example x.y.z() -> x.w.z() should highlight y/w in a lot of cases).

For the most part, it seems to draw the eye to the general area where you need to look closer. It found a near-invisible typo in a coworker's PR which was kind of interesting as well.

https://0github.com/geldata/gel-rust/pull/530

It seems to flag _some_ deletions as needing attention, but I feel like a lot of them are ignored.

Is this using some sort of measure of distance between the expected token in this position vs the actual token?

EDIT: Oh, I guess it's just an LLM prompt? I would be interested to see an approach where the expected token vs actual token generates a heatmap.

lawrencechen•3mo ago

Happy to hear!

> Is this using some sort of measure of distance between the expected token in this position vs the actual token?

The main implementation is in this file: https://github.com/manaflow-ai/cmux/blob/main/apps/www/lib/s...

EDIT: yeah it's just a LLM prompt haha

Just a simple prompt right now, but I think we could try an approach where we directly see which tokens might be hallucinated. Gonna try to find the paper for this idea. Might be kinda analogous to the "distance between the expected token in this position vs the actual token."

rishabhaiover•3mo ago

wondering what if you run a SAST (a fast one) and share that with codex alongside the code diff?

antback•3mo ago

Very, very useful. I'll give it a try. Thanks for sharing!

fao_•3mo ago

How do I opt out of this tool? I do not want anyone reviewing my code or projects to use or engage with it and it is explicitly against the TOS of those projects. It would be nice if this tool screened for a robots.txt or something of the sort so that I could ensure that this tool never touches my projects.

lpapez•3mo ago

Don't share your code publicly then?

smcleod•3mo ago

Why does it require signing and granting you full access to act as me on Github to use?

cmux-agent requires access to your Github account:

    Verify your GitHub identity
    Know what resources you can access
    Act on your behalf
    View your email addresses

I would have logged an issue for this but I see you've disabled logging issues on the repo. Seems a bit sus to me.

lawrencechen•3mo ago

Public repos shouldn't require being signed in.

Just tested these example links in incognito and seemed to work?

https://0github.com/manaflow-ai/cmux/pull/666

https://0github.com/stack-auth/stack-auth/pull/988

https://0github.com/tinygrad/tinygrad/pull/12995

https://0github.com/simonw/datasette/pull/2548

> you've disabled logging issues on the repo

Sorry, wasn't aware. Turning it on right now. EDIT: https://github.com/manaflow-ai/cmux/issues seems to be fine?

smcleod•3mo ago

It's when you first start the app it asks you to login using GitHub before you see anything else.

lawrencechen•3mo ago

cmux desktop app currently requires signing in to GitHub. We will build out better support for local repositories and remove sign in requirement soon.

csomar•3mo ago

It is GitHub mess. See the discussion: https://github.com/orgs/community/discussions/37117

To keep it short, GitHub has oauth App and "GitHub Apps". GitHub Apps are the new model and they can be installed to particular repos instead of having wide access to your account. GitHub recommends you use them. There is one catch however: GitHub did architecture these apps so that they can "act on the user behalf". Even if your app only asks for "an email address", they will still have that "permission" even though it is against nothing.

Thus, the scary popup. I've found the only solution to this is to "complicate" your flow. If you go to https://codeinput.com (my app), and click login with GitHub, you'll be taken to a less scarier popup that only asks for your email (it's an oauth app!). This, however, is at the expense of you having to do the "authenticate + install" dance again after you login! So I had to create an onboarding step, kind of to explain to the user the different steps he has to take.

tiffnami•3mo ago

yoooooo this looks awesome!

mattfrommars•3mo ago

Can someone please explain to me how do people build these kind of tools? My background is classic Java/C# backend development and SQL. A bit microservice using Spring Boot. Its 8:30pm and I'm watching React tutorials to understand better how modern websites are built - e.g. use useState, useRef etc.

Now, how does any of my experience translate to building tools like cmux? I genuinely want to understand how.

Is the answer to go line by line of cmux code base or make an attempt to open a PR on one of the bugs issues on cmux and, by magic and time, I will eventually understand?

rahimnathwani•3mo ago

What would you recommend to someone new in your team who had only ever used python and a bit of SQL, and had never touched Java or Spring Boot?

joshribakoff•3mo ago

High level: hit github api, feed code to llm, display results in web app.

If you want to learn web apps start with the docs, eg. Official react docs or even just learning vanilla JavaScript if you don’t know it.

Start with little pieces like hitting the github API and displaying some json in the terminal

You could also just start prompting an llm to scaffold a project for you and then trying to debug whatever issues come up (and they will)

lawrencechen•3mo ago

If your goal is to make something useful, I think the fastest way is probably to build a CLI only version since you can theoretically render heatmaps and make a task manager in a CLI form factor. And your background in Java/C# helps here.

Use Claude Code or Codex for everything, learn how to prompt well. >90% of cmux and 0github.com was written by LLMs. Most of it was just me asking the LLM to implement something, testing it to see if it works, and if it doesn't, I'll ask the LLM to write logs, and I'll paste the logs back to the LLM. Ask gpt-5-pro for architecture choices, like what tech/dependencies to use.

But if your goal is to learn React, I'd recommend going through the official getting started documentation, it's pretty good.

rf15•3mo ago

Your experience in coding is enough, you need more practice in "problem solving" with crazy ideas and working through them to the finish line.

Besides, this is just a thin layer on an LLM, with questionable actual quality. Learn to do the real work, no magic machine can take learning and skill building off your shoulders.

rs186•3mo ago

I suggest that you ask Claude Code to build such a website for you with a minimal set of features, with tons and tons of comments and design/architecture documents plus tests. Once that is done, you can start reading the code. You can even read as it is working.

Then, you can point Claude Code to a file/a function/a few lines and ask follow-up questions.

After that, there are even more things to do. If you want a different perspective, you could try completely reimplementing the thing. My guess is that Claude will use Next.js. You can ask Claude not to do that but instead use a different UI framework/no framework combined with C#, if that's something you are interested in. If you want to actually learn all the details, you can start setting things up yourself and write the website. You can add features or try making the site scalable, under AI-assisted or vibe coding mode.

It will not produce the most elegant code or have the best architecture, but will be good enough for your purpose. I think it's the most efficient way to get some learning that is specifically suited to your needs in this age.

otterley•3mo ago

You’re not going to be able to keep the domain name 0github.com for too long. I’d suggest you start finding a new one immediately.

personjerry•3mo ago

why?

nine_k•3mo ago

For the same reason your diff viewer highlights it: it looks like a scam attempt, not like a clever pun.

You likely will be able to keep it without trouble, but many corporate security systems would flag it.

otterley•3mo ago

Because it violates GitHub’s trademark. I expect them to send the author a cease and desist notice; and if the author is unresponsive or challenges the notice, GitHub will almost certainly initiate the dispute (UDRP) process, which will inevitably cede control of it to them.

HellsMaddy•3mo ago

This is cool! Please add a dark theme and respect `prefers-color-scheme: dark` :)

rf15•3mo ago

You should have clear metrics, not ChatGPT. ChatGPT is not trained on a huge dataset related to this task.

rckt•3mo ago

I think the need for such a tool should be avoided by simply making reasonable PRs. And what's ironic is that I now have to review PRs written by a no-code person using AI tools. They even response to my comments using AI as well. And with this tool it becomes even more absurd. I guess next step is to replace the reviewer with another AI tool.

usrxcghghj•3mo ago

Im really digging the idea of using models to help contextualize the code your looking at rather than just write it. READING code being the difficulty, if a llm has the ability to add context clues and hints to help this, i think it is a very powerful feature

MattyRad•3mo ago

Seems like a catch-22. For codebases that I'm highly familiar with and regularly perform code review in, I'd say "thanks LLM, but I don't trust you, I'm more familiar with this codebase than you, and I don't need your help." For codebases that I'm not familiar with, I'm not really performing code review (at least not approving MR/PRs or doing the merging).

But still, this is very creative and a nice application of LLMs that isn't strictly barf.

MattyRad•3mo ago

Ok, I'll bite though, let's try it out as a non-maintainer.

I loaded https://0github.com/laravel/framework/pull/57499. Completely random, it's a PR in the last github repo I had open.

At 60%, it highlights significantly more test code than the material changes that need review. Strike one.

At no threshold (0-100) does it highlight the deleted code in UniqueBroadcastEvent.php, which seems highly important to review. The maintainer even comments about the removal in the actual PR! Strike two.

The only line that gets highlighted at > 50% in the material code diffs is one that hasn't changed. Strike three.

So, honest attempt, but it didn't work out for me.

DanielBryars•3mo ago

Great idea, really cool.

I noticed in the example you shared it highlighted the choice of SHA1 for further attention, because it was deprecated. I think thats good. In this case, lets say I do actually want to use it and pop a comment above it, "SHA1 deliberate, partitioning only, no security exposure" I presume the LLM would take that into account. I'll try it out when I can.

isodev•3mo ago

> Under the hood, we clone the repo into a VM, spin up gpt-5-codex for every diff, and ask it to output a JSON data structure that we parse into a colored heatmap.

Wait, you're consuming the energy a small town needs for a week just so you don't have to write a couple of lines to parse your content into a heat map of whatever strings you're looking for? This is crazy, and given our climate shituation, should be illegal.

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Show HN: Kappal – CLI to Run Docker Compose YML on Kubernetes for Local Dev

Show HN: If you lose your memory, how to regain access to your computer?

Show HN: I spent 4 years building a UI design tool with only the features I use

Show HN: I built a <400ms latency voice agent that runs on a 4gb vram GTX 1650"

Show HN: Stacky – certain block game clone

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

Show HN: Smooth CLI – Token-efficient browser for AI agents

Show HN: A toy compiler I built in high school (runs in browser)

Show HN: Artifact Keeper – Open-Source Artifactory/Nexus Alternative in Rust

Show HN: Nginx-defender – realtime abuse blocking for Nginx

Show HN: BioTradingArena – Benchmark for LLMs to predict biotech stock movements

Show HN: Slack CLI for Agents

Show HN: ARM64 Android Dev Kit

Show HN: Gigacode – Use OpenCode's UI with Claude Code/Codex/Amp

Show HN: MCP App to play backgammon with your LLM

Show HN: I'm 75, building an OSS Virtual Protest Protocol for digital activism

Show HN: I built Divvy to split restaurant bills from a photo

Show HN: Which chef knife steels are good? Data from 540 Reddit tread

Show HN: Micropolis/SimCity Clone in Emacs Lisp

Show HN: I Hacked My Family's Meal Planning with an App

Show HN: Daily-updated database of malicious browser extensions

Show HN: Slop News – HN front page now, but it's all slop

Show HN: I built a free UCP checker – see if AI agents can find your store

Show HN: Falcon's Eye (isometric NetHack) running in the browser via WebAssembly

Show HN: Horizons – OSS agent execution engine

Show HN: XAPIs.dev – Twitter API Alternative at 90% Lower Cost

Show HN: Compile-Time Vibe Coding

Show HN: Local task classifier and dispatcher on RTX 3080

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Show HN: Kappal – CLI to Run Docker Compose YML on Kubernetes for Local Dev

Show HN: If you lose your memory, how to regain access to your computer?

Show HN: I spent 4 years building a UI design tool with only the features I use

Show HN: I built a <400ms latency voice agent that runs on a 4gb vram GTX 1650"

Show HN: Stacky – certain block game clone

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

Show HN: Smooth CLI – Token-efficient browser for AI agents

Show HN: A toy compiler I built in high school (runs in browser)

Show HN: Artifact Keeper – Open-Source Artifactory/Nexus Alternative in Rust

Show HN: Nginx-defender – realtime abuse blocking for Nginx

Show HN: BioTradingArena – Benchmark for LLMs to predict biotech stock movements

Show HN: Slack CLI for Agents

Show HN: ARM64 Android Dev Kit

Show HN: Gigacode – Use OpenCode's UI with Claude Code/Codex/Amp

Show HN: MCP App to play backgammon with your LLM

Show HN: I'm 75, building an OSS Virtual Protest Protocol for digital activism

Show HN: I built Divvy to split restaurant bills from a photo

Show HN: Which chef knife steels are good? Data from 540 Reddit tread

Show HN: Micropolis/SimCity Clone in Emacs Lisp

Show HN: I Hacked My Family's Meal Planning with an App

Show HN: Daily-updated database of malicious browser extensions

Show HN: Slop News – HN front page now, but it's all slop

Show HN: I built a free UCP checker – see if AI agents can find your store

Show HN: Falcon's Eye (isometric NetHack) running in the browser via WebAssembly

Show HN: Horizons – OSS agent execution engine

Show HN: XAPIs.dev – Twitter API Alternative at 90% Lower Cost

Show HN: Compile-Time Vibe Coding

Show HN: Local task classifier and dispatcher on RTX 3080

Show HN: I made a heatmap diff viewer for code reviews

Comments