The unexpected effectiveness of one-shot decompilation with Claude

https://blog.chrislewis.au/the-unexpected-effectiveness-of-one-shot-decompilation-with-claude/

241•knackers•2mo ago

Comments

knackers•2mo ago

I've been experimenting with running Claude in headless mode + a continuous loop to decompile N64 functions and the results have been pretty incredible. (This is despite already using Claude in my decompilation workflow).

I hope that others find this similarly useful.

garrettjoecox•2mo ago

What game are you working on?

wk_end•2mo ago

Last sentence of the first paragraph says it’s Snowboard Kids 2.

rat9988•2mo ago

For his defense, it is missing a "Tell HN"

dpkirchner•2mo ago

And it isn't always obvious when the commenter is the submitter (no [S] tag like you see on other sites).

garrettjoecox•2mo ago

whoops, I did indeed miss that this was OP

plastic-enjoyer•2mo ago

This sounds interesting! Do you have some good introduction to N64 decompiliation? Would you recommend using Claude right from the start or rather try to get to know the ins and outs of N64 decomp?

turnsout•2mo ago

This is super cool! I would be curious to see how Gemini 3 fares… I've found it to be even more effective than Opus 4.5 at technical analysis (in another domain).

viraptor•2mo ago

One thing I don't annoying in really old sources is that sometimes you can't go function by function, because the code will occasionally just use a random register to pass results. Passing the whole file works better at that point.

djmips•2mo ago

Thanks, this is very cool! I've started to dip my toes into this and it's good to see it has potential.

ACCount37•2mo ago

If you aren't using LLMs for your reverse engineering tasks, you're missing out, big time. Claude kicks ass.

It's good at cleaning up decompiled code, at figuring out what functions do, at uncovering weird assembly tricks and more.

amelius•2mo ago

Makes sense because LLMs are quite good at translating between natural languages.

Anyway, we're reaching the point where documentation can be generated by LLMs and this is great news for developers.

monsieurbanana•2mo ago

Maybe documentation meant for other llms to ingest. Their documentation is like their code, it might work, but I don't want to have to be the one to read it.

Although of course if you don't vibe document but instead just use them as a tool, with significant human input, then yes go ahead.

dunham•2mo ago

Although with code it's implementing functions that don't exist yet and with documentation, it's describing functions that don't exist yet.

james_marks•2mo ago

I stumbled across a fun trick this week. After making some API changes, I had CC “write a note to the FE team with the changes”.

I then pasted this to another CC instance running the FE app, and it made the counter part.

Yes, I could have CC running against both repos and sometimes do, but I often run separate instances when tasks are complex.

saagarjha•2mo ago

Documentation is one place where humans should have input. If an LLM can generate documentation, why would I want you to generate it when I can do so myself (probably with a better, newer model)?

simonw•2mo ago

I definitely want documentation that a project expert has reviewed. I've found LLMs are fantastic at writing documentation about how something works, but they have a nasty tendency to take guesses at WHY - you'll get occasional sentences like "This improves the efficiency of the system".

I don't want invented rationales for changes, I want to know the actual reason a developer decided that the code should work that way.

saagarjha•2mo ago

Exactly. Often this information is not actually present in the code itself which is exactly why I would want documentation in the first place, given that I can always read the code myself if needed.

ACCount37•2mo ago

That's great if those humans are around to have that input.

Not so much when you have a lot of code from 6 years ago, built around an obscure SDK, and you have to figure out how it works, and the documentation is both incredibly sparse and in Chinese.

saagarjha•2mo ago

If you want to have an LLM piece together and translate documentation, this seems fairly reasonable

amelius•2mo ago

Because it takes time and effort to write documentation.

If people __can__ actually read undocumented code with the help of LLMs, why do you need human-written documentation really?

baq•2mo ago

Docs are a form of error correcting coding for code. Docs+code allows you to spot discrepancies and ask which one is the intended behavior.

gr4vityWall•2mo ago

It doesn't need to be written by a human only, but I think generating it once and distributing it with source code is more efficient. Developers can correct errors in the generated documentation, which then can be used by humans and LLMs.

saagarjha•2mo ago

I can read code without the help of LLMs, too. Human documentation tells me why the code was written.

keepamovin•2mo ago

The article is a useful resource for setting up automated flows, and Claude is great at assembly. Codex less so, Gemini is also good at assembly. Gemini will happily hand roll x86_64 bytecode. Codex appears optimized for more "mainstream" dev tasks, and excels at that. If only Gemini had a great agent...

xnx•2mo ago

Is Gemini CLI not a good agent?

keepamovin•2mo ago

I...didn't know there was a Gemini CLI? I thought it was only antigravity, or hackily plug your API keys into something like Cursor...Thanks! I got to check.

skerit•2mo ago

I've been using Claude for months with Ghidra. It is simply amazing.

djmips•2mo ago

What's your workflow? Are you mainly going after x86 targets? Are you using a plugin?

lomase•2mo ago

If it is so good at it where are all those decompilation projects?

jamesbelchamber•2mo ago

This is a refreshingly practical demonstration of an LLM adding value. More of this please.

rlili•2mo ago

Makes me wonder if decompilation could eventually become so trivial that everything would become de-facto open source.

Xmd5a•2mo ago

This deserves a discussion

ronsor•2mo ago

I've used LLMs to help with decompilation since the original release of GPT-4. They're excellent at recognizing the purpose of functions and refactoring IDA or Ghidra pseudo-C into readable code.

galangalalgol•2mo ago

How does it do on things that were originally written in assembly?

saagarjha•2mo ago

This is typically easier because the code was written for humans already.

euroderf•2mo ago

Someone please try this on an original (early 1980s) IBM-PC BIOS.

mh-•2mo ago

Got a bin?

tadfisher•2mo ago

I don't believe that was written in a compiled language, so any old 8086 disassembler should suffice. I would love to see what comments an LLM adds to the assembly code, though.

stevemk14ebr•2mo ago

We're very far away from this.

lomase•2mo ago

Downvoted by the clankers.

js8•2mo ago

Yes, I believe it will. What I predict will happen is that most commercial software will be hosted and provided through "trusted" platforms with limited access, making reverse engineering impossible.

VikingCoder•2mo ago

I wonder when you're never going to run expensive software on your own CPU.

It'll either all be in the cloud, so you never run the code...

Or it'll be on a chip, in a hermetically sealed usb drive, that you plug in to your computer.

Aeolun•2mo ago

When the decompilation like that is trivial, so is recreation without decompilation. It implies the LLM know exactly how thins work.

jasonjmcghee•2mo ago

It would be "source available", if anything, not "open source".

> An open-source license is a type of license for computer software and other products that allows the source code, blueprint or design to be used, modified or shared (with or without modification) under defined terms and conditions.

https://en.wikipedia.org/wiki/Open_source

Companies have been really abusing what open source means- claiming something is "open source" cause they share the code and then having a license that says you can't use any part of it in any way.

Similarly if you ever use that software or depending on where you downloaded it from, you might have agreed not to decompile or read the source code. Using that code is a gamble.

sa1•2mo ago

But clean room reverse engineered code can have its own license, no?

simonw•2mo ago

Yeah, I think it can. I'm reminded of the thing in the 80s when Compaq reverse engineered and reimplemented the IBM BIOS by having one team decompile it and write a spec which they handed to a separate team who built a new implementation based on the spec.

I expect that for games the more important piece will be the art assets - like how the Quake game engine was open source but you still needed to buy a copy of the game in order to use the textures.

vunderba•2mo ago

In fact, the story of how Atari tried to circumvent the lockout chip on the original NES is a good example of this.

They had gotten surprisingly close to a complete decompilation, but then they tried to request a copy of the source code from the copyright office citing that they needed it as a result of ongoing unrelated litigation with Nintendo.

Later on this killed them in court.

comex•2mo ago

If we're talking about actual clean-room reverse engineering where only the overall design or spec is copied and not the specific code, then yes. In this process, one person would decompile the original and turn it into a human-readable spec, and another person would write their own implementation. But the decompiled code itself is never distributed.

That's very different from the decompilation projects being discussed here, which do distribute the decompiled code.

These decompilation projects do involve some creative choices, which means that the decompilation would likely be considered a derivative work, containing copyrightable elements from both the authors of the original binary and the authors of the decompilation project. This is similar to a human translation of a literary work. A derivative work does have its own copyright, but distributing a derivative work requires permission from the copyright holders of both the original and the derivative. So a decompilation project technically can set their own license, and thereby add additional restrictions, but they can't overwrite the original license. If there is no original license, the default is that you can't distribute at all.

DrNosferatu•2mo ago

But, for example, isn't Cannonball (SEGA Outrun source port) open source?

https://github.com/djyt/cannonball

jasonjmcghee•2mo ago

No it is not. There is no license in that repository.

Relevant: https://github.com/orgs/community/discussions/82431

> When you make a creative work (which includes code), the work is under exclusive copyright by default. Unless you include a license that specifies otherwise, nobody else can copy, distribute, or modify your work without being at risk of take-downs, shake-downs, or litigation. Once the work has other contributors (each a copyright holder), “nobody” starts including you.

https://choosealicense.com/no-permission/

seba_dos1•2mo ago

There is a license: https://github.com/djyt/cannonball/blob/master/docs/license....

...but it's very clearly not an open source license.

jasonjmcghee•2mo ago

ah thanks, you're right - I didn't think to look in subfolders. genuinely never seen a license in a subfolder before.

seba_dos1•2mo ago

It doesn't even have to be there, you could state what the license is on a website or in an e-mail. Sometimes you can find it in the header of source files (as seen here as well). Having a "LICENSE" or "COPYING" file in the root of the repository is just a common pattern made even more common by all the tools that can consume it automatically (including GitHub's UI).

yieldcrv•2mo ago

Open source never meant free to begin with and was never software specific, that’s a colloquialism and I’d love to say “language evolves” in favor of the software community’s use but open source is used in other still similar contexts, specifically legal and public policy ones

FOSS specifically means/meant free and open source software, the free and software words are there for a reason

so we don’t need another distinction like “source available” that people need to understand to convey an already shared concept

yes, companies abuse their community’s interest in something by blending open source legal term as a marketing term

jasonjmcghee•2mo ago

Whether or not something is "free" is a separate matter and subject to how the software is licensed. If there is no license it is, by definition "source available", not open source. "source available" is not some new distinction I'm making up.

See my other comment: https://news.ycombinator.com/item?id=46175760

viraptor•2mo ago

This is not a space for "language evolves". Open source has very specific definitions and the distinctions there matter for legal purposes https://opensource.org/licenses

yieldcrv•2mo ago

the software community is the one trying to evolve the language in favor of this software license specific use case

naniwaduni•2mo ago

The terms of the licenses have legal bearing. The definitions of open source are a historical accident that mostly traces back to "well GPLv2 already exists".

mkatx•2mo ago

So instead of reverse engineering.. an llm/agent/whatever could simply produce custom apps for everyone, simply implementing the features an individual might want. A more viable path?

delaminator•2mo ago

> If you're willing to restrict the flexibility of your approach, you can almost always do something better

John Carmack

One of the great challenges of building apps is guessing the 80/20. I think we’re actually entering the long dreamt of reusable component age.

DrNosferatu•2mo ago

This day will arrive.

And it will be great for retro game preservation.

Having more integrated tools and tutorials on this would be awesome.

tcdent•2mo ago

That's definitely a possible future abstraction and one are about the future of technology I'm excited about.

First we get to tackle all of the small ideas and side projects we haven't had time to prioritize.

Then, we start taking ownership of all of the software systems that we interact with on a daily basis; hacking in modifications and reverse engineering protocols to suit our needs.

Finally our own interaction with software becomes entirely boutique: operating systems, firmware, user interfaces that we have directed ourselves to suit our individual tastes.

johnfn•2mo ago

Surely then people start using LLMs to obfuscate compiled source to the point that another LLM can’t deobfuscate it. I imagine it’s always easier to make something messy than clean. Something like a rule of thermodynamics or something :)

Though, that’s only for actively developer software. I can imagine a great future where all retro games are now source available.

tuhgdetzhh•2mo ago

But on the other hand, at the current speed of LLM progression, a game that might have been obfuscated with the help of Opus 4.5 might in two years be decompiled within hours by Opus 6.5.

anabis•2mo ago

Would some sparks fly when easy decompile of MSOffice and Photoshop are available, I wonder.

jonhohle•2mo ago

That runs into copyright issues. As someone who does a reasonable amount of decompilation, I wouldn’t ever use an LLM. It falls too close to mechanical transformation territory which is not protected, fair use.

Obviously others aren’t concerned or don’t live in jurisdictions where that would be an issue.

TheAceOfHearts•2mo ago

If progress continues, someday it'll be possible to generate the source code for any binary and make a native port to any other platform. Some companies might be upset, but it'll be a huge boon for game and software preservation.

big-and-small•2mo ago

While "native" is always questionable it was possible long ago without actually ever having source code. E.g back in 2024 there was ARM port of StarCraft 1 for OpenPandora. Just "dumb" x86 decompilation and then building against Wine for ARM.

https://hackaday.com/2014/07/31/playing-starcraft-on-an-arm/

https://pyra-handheld.com/boards/threads/starcraft.73844/

saagarjha•2mo ago

It's worth noting here that the author came up with a handful of good heuristics to guide Claude and a very specific goal, and the LLM did a good job given those constraints. Most seasoned reverse engineers I know have found similar wins with those in place.

What LLMs are (still?) not good at is one-shot reverse engineering for understanding by a non-expert. If that's your goal, don't blindly use an LLM. People already know that you getting an LLM to write prose or code is bad, but it's worth remembering that doing this for decompilation is even harder :)

ph4evers•2mo ago

Are they not performing well because they are trained to be more generic, or is the task too complex? It seems like a cheap problem to fine-tune.

pixl97•2mo ago

Sounds like a more agentic pipeline task. Decompile, assess, explain.

motoboi•2mo ago

The knowledge probably is o the pre-training data (the internet documenta the LLM is trained at to get a good grasp), but probably very poorly represented in the reinforcement learning phase.

Which is to say that probably antropic don’t have good training documents and evals to teach the model how to do that.

Well they didn’t. But now they have some.

If the author want to improve his efficiency even more, I’d suggest he starts creating tools that allow a human to create a text trace of a good run on decompilating this project.

Those traces can be hosted in a place Antropic can see and then after the next model pre-training there will be a good chance the model become even better at this task.

saagarjha•2mo ago

You need a lot of context to get the correct answer and it’s difficult to know you’ve got the correct answer among the many options.

zdware•2mo ago

Agree with this. I'm a software engineer that has mostly not had to manage memory for most of my career.

I asked Opus how hard it would be to port the script extender for Baldurs Gate 3 from Windows to the native Linux Build. It outlined that it would be very difficult for someone without reverse engineering experience, and correctly pointed out they are using different compilers, so it's not a simple mapping exercise. It's recommendation was not to try unless I was a Ghrida master and had lots of time in my hands.

dimitri-vs•2mo ago

FWIW most LLMs are pretty terrible at estimating complexity. If you've used Claude Code for any length of time you might be familiar with it's plan "timelines" which always span many days but for medium size projects get implemented in about an hour.

I've had CC build semi-complex Tauri, PyQT6, Rust and SvelteKit apps for me without me having ever touched that language. Is the code quality good? Probably not. But all those apps were local-only tools or had less than 10 users so it doesn't matter.

zdware•2mo ago

That's fair, I've had similar experiences working in other stacks with it. And with some niche stacks, it seems to struggle more. Definitely agree the more narrow the context/problem statement, higher chance of success.

For this project, it described its reasoning well, and knowing my own skillset, and surface level info on how one would start this, it had many good points that made the project not realistic for me.

hobs•2mo ago

Disagree - the timelines are completely reasonable for an actual software project, and that's what the training data is based on, not projects written with LLMs.

theturtle32•2mo ago

Yes, this is my experience as well.

delaminator•2mo ago

Claude gives advice on complexity for humans. Many times it has tried to push me away from what I’m trying to do because it is difficult, time consuming or tedious. I push it through its resistance and 10 minutes later it’s done.

I have this in my CLAUDE.md now.

“We are here to do the difficult and have plenty of time and there’s no rush.”

butz•2mo ago

Are there any similar specialized decompilation LLM models available to be used locally?

VikingCoder•2mo ago

I've been waiting for decompilation to show up in this space.

simonw•2mo ago

For anyone else who was initially confused by this, useful context is that Snowboard Kids 2 is an N64 game.

I also wasn't familiar with this terminology:

> You hand it a function; it tries to match it, and you move on.

In decompilation "matching" means you found a function block in the machine code, wrote some C, then confirmed that the C produces the exact same binary machine code once it is compiled.

The author's previous post explains this all in a bunch more detail: https://blog.chrislewis.au/using-coding-agents-to-decompile-...

elitan•2mo ago

helpful

your_sweetpea•2mo ago

I'd like to see this given a bit more structure, honestly. What occurs to me is constraining the grammar for LLM inference to ensure valid C89 (or close-to, as much can be checked without compilation), then perhaps experimentally switching to a permuter once/if a certain threshold is reached for accuracy of the decompiled function.

Eventually some or many of these attempts would, of course, fail, and require programmer intervention, but I suspect we might be surprised how far it could go.

ACCount37•2mo ago

I don't expect constraining the grammar to do all that much for modern LLMs - they're pretty good at constraining themselves. Having it absorb the 1% of failures that's caused by grammar issues is not worth the engineering effort.

The modern approach is: feed the errors back to the LLM and have it fix them.

Animats•2mo ago

In decompilation "matching" means you found a function block in the machine code, wrote some C, then confirmed that the C produces the exact same binary machine code once it is compiled.

They had access to the same C compiler used by Nintendo in 1999? And the register allocation on a MIPS CPU is repeatable enough to get an exact match? That's impressive.

ACCount37•2mo ago

Broadly, yes.

The groundwork for this kind of "matching" process is: sourcing odd versions of the obscure tooling that was used to build the target software 20 years ago, and playing with the flag combinations to find out which was used.

It helps that compilers back then were far less complex than those of today, and so was the code itself. But it's still not a perfect process.

There are cases of "flaky" code - for example, code that depends on the code around it. So you change one function, and that causes 5 other functions to no longer match, and 2 functions to go from not matching to matching instead.

Figuring out and resolving those strange dependencies is not at all trivial, so a lot of decompliation efforts end up wrapping it up at some "100% functional, 99%+ matching".

simonw•2mo ago

There's a note about that:

> Snowboard Kids 2 was written in C and compiled to MIPS machine code. The compiler was likely GCC 2.7.2 based on the instruction patterns [3]

The footnote is interesting: https://blog.chrislewis.au/using-coding-agents-to-decompile-...

> This is mostly just guesswork and trying different variations of compiler versions and configuration options. But it isn’t as bad as it sounds since the time period limits which compilers were plausibly used. The compiler arguments used in other, similar, games also provide a useful reference.

tails4e•2mo ago

Why not follow decompilation like ghidra does, rather than guess, compile, compare? It seems more sensible to actually decompile.

jfjfnfjrh•2mo ago

Because decompilation does has functions and variables that are nonhuman parsable ... I.e. func_1223337377 with variables a b c d

govping•2mo ago

We've been using LLMs for security research (finding vulnerabilities in ML frameworks) and the pattern is similar - it's surprisingly good at the systematic parts (pattern recognition, code flow analysis) when you give it specific constraints and clear success criteria.

The interesting part: the model consistently underestimates its own speed. We built a complete bug bounty submission pipeline - target research, vulnerability scanning, POC development - in hours when it estimated days. The '10 attempts' heuristic resonates - there's definitely a point where iteration stops being productive.

For decompilation specifically, the 1M context window helps enormously. We can feed entire codebases and ask 'trace this user input to potential sinks' which would be tedious manually. Not perfect, but genuinely useful when combined with human validation.

The key seems to be: narrow scope + clear validation criteria + iterative refinement. Same as this decompilation work.

slavik81•2mo ago

Snowboard Kids 2 was a great N64 game. It was one of a number of racing titles inspired by Mario Kart, but the snowboarding added a bit of a different feel. The battle items were clever, and the stages were really well made given the technical limitations they faced. As a kid with two brothers, we played a lot of competitive multiplayer.

I also remember a few things in the singleplayer being very difficult. The number of times I had to fight/race Dameian in his giant robot running down the mountainside... It's carved into my brain like that footrace against Wizpig in DKR or the Donkey Kong arcade game for the Rareware coin in DK64.

The battle items in Snowboard Kids were clever and memorable. The parachute missile that would launch racers up in the air and then deploy the parachute so they slowly float back down was such a frustrating item to be hit with. The pans that would hit all opponents was iconic and it was hilarious that you could somehow doge it with invisibility. Even the basic rock dropped on the course was somehow memorable.

Great game. It's heartwarming to know that others still remember it and care about it.

DrNosferatu•2mo ago

More than an overview, a step by step tutorial on this would be awesome!

t_mann•2mo ago

> The ‘give up after ten attempts’ threshold aims to prevent Claude from wasting tokens when further progress is unlikely. It was only partially successful, as Claude would still sometimes make dozens of attempts.

Not what I would have expected from a 'one-shot'. Maybe self-supervised would be a more suitable term?

wavemode•2mo ago

"one-shot" usually just means, one example and its correct answer was provided in the prompt.

Ask HN: Do we need "metadata in source code" syntax that LLMs will never delete?

Pentagon cutting ties w/ "woke" Harvard, ending military training & fellowships

Can Quantum-Mechanical Description of Physical Reality Be Considered Complete? [pdf]

Kessler Syndrome Has Started [video]

Complex Heterodynes Explained

EVs Are a Failed Experiment

MemAlign: Building Better LLM Judges from Human Feedback with Scalable Memory

CCC (Claude's C Compiler) on Compiler Explorer

Homeland Security Spying on Reddit Users

Actors with Tokio (2021)

Can graph neural networks for biology realistically run on edge devices?

Deeper into the shareing of one air conditioner for 2 rooms

Weatherman introduces fruit-based authentication system to combat deep fakes

Why Embedded Models Must Hallucinate: A Boundary Theory (RCC)

A Curated List of ML System Design Case Studies

Pony Alpha: New free 200K context model for coding, reasoning and roleplay

Show HN: Tunbot – Discord bot for temporary Cloudflare tunnels behind CGNAT

Open Problems in Mechanistic Interpretability

Bye Bye Humanity: The Potential AMOC Collapse

Dexter: Claude-Code-Style Agent for Financial Statements and Valuation

Digital Iris [video]

Essential CDN: The CDN that lets you do more than JavaScript

They Hijacked Our Tech [video]

Vouch

HRL Labs in Malibu laying off 1/3 of their workforce

Show HN: High-performance bidirectional list for React, React Native, and Vue

Show HN: I built a Mac screen recorder Recap.Studio

Ask HN: Codex 5.3 broke toolcalls? Opus 4.6 ignores instructions?

Vectors and HNSW for Dummies

Sanskrit AI beats CleanRL SOTA by 125%

Ask HN: Do we need "metadata in source code" syntax that LLMs will never delete?

Pentagon cutting ties w/ "woke" Harvard, ending military training & fellowships

Can Quantum-Mechanical Description of Physical Reality Be Considered Complete? [pdf]

Kessler Syndrome Has Started [video]

Complex Heterodynes Explained

EVs Are a Failed Experiment

MemAlign: Building Better LLM Judges from Human Feedback with Scalable Memory

CCC (Claude's C Compiler) on Compiler Explorer

Homeland Security Spying on Reddit Users

Actors with Tokio (2021)

Can graph neural networks for biology realistically run on edge devices?

Deeper into the shareing of one air conditioner for 2 rooms

Weatherman introduces fruit-based authentication system to combat deep fakes

Why Embedded Models Must Hallucinate: A Boundary Theory (RCC)

A Curated List of ML System Design Case Studies

Pony Alpha: New free 200K context model for coding, reasoning and roleplay

Show HN: Tunbot – Discord bot for temporary Cloudflare tunnels behind CGNAT

Open Problems in Mechanistic Interpretability

Bye Bye Humanity: The Potential AMOC Collapse

Dexter: Claude-Code-Style Agent for Financial Statements and Valuation

Digital Iris [video]

Essential CDN: The CDN that lets you do more than JavaScript

They Hijacked Our Tech [video]

Vouch

HRL Labs in Malibu laying off 1/3 of their workforce

Show HN: High-performance bidirectional list for React, React Native, and Vue

Show HN: I built a Mac screen recorder Recap.Studio

Ask HN: Codex 5.3 broke toolcalls? Opus 4.6 ignores instructions?

Vectors and HNSW for Dummies

Sanskrit AI beats CleanRL SOTA by 125%

The unexpected effectiveness of one-shot decompilation with Claude

Comments