Disassembling terabytes of random data with Zig and Capstone to prove a point

https://jstrieb.github.io/posts/random-instructions/

27•birdculture•4d ago

Comments

0x1ch•1h ago

I believe this is the third or fourth posting of this article in the last week.

degamad•49m ago

Yep: https://news.ycombinator.com/from?site=jstrieb.github.io

mfcl•37m ago

Why the AI disclosure? Is it just for the author to make sure the readers know they are AI-skeptic and use the opportunity to link to another article, or would there be something wrong with the proof had AI been used to help write the code?

(By help I mean just help, not write an entire sloppy article.)

jstrieb•19m ago

Hey, I wrote this! There are a couple of reasons that I included the disclosure.

The main one is to set reader expectations that any errors are entirely my own, and that I spent time reviewing the details of the work. The disclosure seemed to me a concise way to do that -- my intention was not any form of anti-AI virtue signaling.

The other reason is that I may use AI for some of my future work, and as a reader, I would prefer a disclosure about that. So I figured if I'm going to disclose using it, I might as well disclose not using it.

I linked to other thoughts on AI just in case others are interested in what I have to say. I don't stand to gain anything from what I write, and I don't even have analytics to tell me more people are viewing it.

All in all, I was just trying to be transparent, and share my work.

kazinator•9m ago

[delayed]

kazinator•32m ago

In common anecdotal experience with disassembling code, it is very common for data areas interspersed with code (like string literals) to disaassemble to instructions, momentarily causing the human to be puzzled: what is this repetition of five "or" instructions doing here referencing registers that would never be arguments?

The reason is that the opcode encoding is very dense, and has no redundancy against detecting bad encodings, and usually no relationship to neighboring words.

By that I mean that some four byte chunk (say) treated as an opcode word is treated that way regardless of what came before or what comes after. If it looks like an opcode with a four-byte immediate operand, then the disassembly will pull in that operand (which can be any bit combination) and skip another four bytes. Nothing in the operand will indicate "this is a bad instruction overall".

kazinator•11m ago

[delayed]

The last-ever penny will be minted today in Philadelphia

Project Euler

Marble: A Multimodal World Model

Steam Machine

Yt-dlp: External JavaScript runtime now required for full YouTube support

Steam Frame

Disassembling terabytes of random data with Zig and Capstone to prove a point

Homebrew no longer allows bypassing Gatekeeper for unsigned/unnotarized software

Launch HN: JSX Tool (YC F25) – A Browser Dev-Panel IDE for React

Voyager 1 is a light-day away by November 2026

A brief look at FreeBSD

Blasting Yeast with UV Light

Ioannis Yannas invented artificial skin for treatment of burns–dies at 90

OmniAI (YC W24) Is Hiring Forward Deployed Engineers

.NET 10

Jasmine: A Simple, Performant and Scalable Jax-Based World Modeling Codebase

How Tube Amplifiers Work

Valve Announces New Steam Machine, Steam Controller and Steam Frame

Louisiana Took Months to Sound Alarm Amid Whooping Cough Outbreak

Show HN: Gerbil – an open source desktop app for running LLMs locally

Learn Prolog Now

What happened to Transmeta, the last big dotcom IPO

Making the Clang AST Leaner and Faster

Fighting the New York Times' invasion of user privacy

Software Development in the Time of New Angels

Micro.blog launches new 'Studio' tier with video hosting

Hard drives on backorder for two years as AI data centers trigger HDD shortage

Waymo robotaxis are now giving rides on freeways in LA, SF and Phoenix

Async and Finaliser Deadlocks

Plumbing vs. Internet, Revisited

Disassembling terabytes of random data with Zig and Capstone to prove a point

Comments

The last-ever penny will be minted today in Philadelphia

Project Euler

Marble: A Multimodal World Model

Steam Machine

Yt-dlp: External JavaScript runtime now required for full YouTube support

Steam Frame

Disassembling terabytes of random data with Zig and Capstone to prove a point

Homebrew no longer allows bypassing Gatekeeper for unsigned/unnotarized software

Launch HN: JSX Tool (YC F25) – A Browser Dev-Panel IDE for React

Voyager 1 is a light-day away by November 2026

A brief look at FreeBSD

Blasting Yeast with UV Light

Ioannis Yannas invented artificial skin for treatment of burns–dies at 90

OmniAI (YC W24) Is Hiring Forward Deployed Engineers

.NET 10

Jasmine: A Simple, Performant and Scalable Jax-Based World Modeling Codebase

How Tube Amplifiers Work

Valve Announces New Steam Machine, Steam Controller and Steam Frame

Louisiana Took Months to Sound Alarm Amid Whooping Cough Outbreak

Show HN: Gerbil – an open source desktop app for running LLMs locally

Learn Prolog Now

What happened to Transmeta, the last big dotcom IPO

Making the Clang AST Leaner and Faster

Fighting the New York Times' invasion of user privacy

Software Development in the Time of New Angels

Micro.blog launches new 'Studio' tier with video hosting

Hard drives on backorder for two years as AI data centers trigger HDD shortage

Waymo robotaxis are now giving rides on freeways in LA, SF and Phoenix

Async and Finaliser Deadlocks

Plumbing vs. Internet, Revisited