Deterministic Fully-Static Whole-Binary Translation Without Heuristics

64•matt_d•1h ago

Comments

dmitrygr•26m ago

Cute, but Rice's theorem remains, and while they translated every byte as code, still no handling is possible for

   char buf[] = {0xB8, 0x2A, 0x00, 0x00, 0x00, 0xC3};
   return ((int (*)(void))buf)();

static translation is only possible when you assume no adversarial code AND mostly assume compiler-produced binaries. hand-rolled asm gets hard, and adversarial code is provably unsolvable in all cases.

still, pretty cool for cooperative binaries

tlb•23m ago

But in fact no modern processor/OS executes this either. Pages are marked as executable or not, and static data is loaded as non-executable pages.

dmitrygr•21m ago

that is why it was not "static const char buf[]" ;) it was not an accident

executable stacks are still common (incl on windows with some settings), and sometimes they are required (eg for gcc nested functions)

diamondlovesyou•9m ago

That won't be located on the stack either. The underlying buffer will be a TU local - ie static and not rx

fsmv•11m ago

I only read the abstract but I got the impression that their solution to this is they have both. They translate all the data as if it was code and if it gets called into they use the translation where if it gets read as memory they use the original.

Edit I found this in the paper

> Elevator sidesteps the code-versus-data determination altogether through an application of superset disassembly [6]: we simultaneously interpret every executable byte offset in the original binary as (i) data and (ii) the start of a potential instruction sequence beginning at that offset, and we build the superset control flow graph from every one of the resulting candidate decodes. Every potential target of indirect jumps, callbacks, or other runtime dispatch mechanisms that cannot be statically analyzed therefore has a corresponding landing point in the rewritten binary. These targets are resolved at runtime through a lookup table from original instruction addresses to translated code addresses that we embed in the final binary.

genxy•3m ago

It looks like their system would just generate return 42;

jonhohle•26m ago

This is neat. I haven’t looked into it, but I would think relative offsets could still be an issue, but it seems there must be some translation layer/mmu since the codegen will be different sizes anyway. This would impact jump tables and internal branches, primarily.

I mostly work on stuff from the 90s, but disassemblers make a lot of assumptions about where code starts and ends, but occasionally a binary blob is not discoverable unless you have some prior knowledge (pointer at a fixed location to an entry point).

I would think after a few passes you could refine the binary into areas that are definitely code.

Panzerschrek•3m ago

Can it handle self-modifying code?

Why only x86_64? It has more sense to convert 32-bit programs, like many old games.

Deterministic Fully-Static Whole-Binary Translation Without Heuristics

Restore full BambuNetwork support for Bambu Lab printers

Googlebook

Show HN: Needle: We Distilled Gemini Tool Calling into a 26M Model

The vi family

Kraftwerk's radical 1976 track

How to make your text look futuristic (2016)

My graduation cap runs Rust

CERT is releasing six CVEs for serious security vulnerabilities in dnsmasq

Why senior developers fail to communicate their expertise

When "idle" isn't idle: how a Linux kernel optimization became a QUIC bug

Traceway: MIT-licensed observability stack you can self-host in ~90s

Referer Reality

Rendering the Sky, Sunsets, and Planets

Quack: The DuckDB Client-Server Protocol

Tell NYT, Atlantic, USA Today to keep Wayback Machine

Scrcpy v4.0

Zero-native – Build native desktop apps with web UI

Up in Smoke

Fc, a lossless compressor for floating-point streams

Reimagining the mouse pointer for the AI era

The Future of Obsidian Plugins

Lanzaboote – NixOS Secure Boot

Show HN: Agentic interface for mainframes and COBOL

Launch HN: Voker (YC S24) – Analytics for AI Agents

Bambu Lab is abusing the open source social contract

Foucault's Order of Things Explained with Trading Cards [video]

When life gives you lemons, write better error messages

We tested super-resolution pre-filter for LPR OCR. It did nothing

EFF to 4th Circuit: Electronic Device Searches at the Border Require a Warrant

Deterministic Fully-Static Whole-Binary Translation Without Heuristics

Comments

Deterministic Fully-Static Whole-Binary Translation Without Heuristics

Restore full BambuNetwork support for Bambu Lab printers

Googlebook

Show HN: Needle: We Distilled Gemini Tool Calling into a 26M Model

The vi family

Kraftwerk's radical 1976 track

How to make your text look futuristic (2016)

My graduation cap runs Rust

CERT is releasing six CVEs for serious security vulnerabilities in dnsmasq

Why senior developers fail to communicate their expertise

When "idle" isn't idle: how a Linux kernel optimization became a QUIC bug

Traceway: MIT-licensed observability stack you can self-host in ~90s

Referer Reality

Rendering the Sky, Sunsets, and Planets

Quack: The DuckDB Client-Server Protocol

Tell NYT, Atlantic, USA Today to keep Wayback Machine

Scrcpy v4.0

Zero-native – Build native desktop apps with web UI

Up in Smoke

Fc, a lossless compressor for floating-point streams

Reimagining the mouse pointer for the AI era

The Future of Obsidian Plugins

Lanzaboote – NixOS Secure Boot

Show HN: Agentic interface for mainframes and COBOL

Launch HN: Voker (YC S24) – Analytics for AI Agents

Bambu Lab is abusing the open source social contract

Foucault's Order of Things Explained with Trading Cards [video]

When life gives you lemons, write better error messages

We tested super-resolution pre-filter for LPR OCR. It did nothing

EFF to 4th Circuit: Electronic Device Searches at the Border Require a Warrant