Show HN: GPU-accelerated search for Bitcoin keys generated with weak entropy

https://b4q.io/research

1•orkblutt•1h ago

Hey HN, I've been working on a CUDA pipeline that systematically searches for Bitcoin private keys generated by flawed software from Bitcoin's early years. The linked article is a technical deep dive into the GPU engineering — no source code (project is closed-source), but detailed pseudocode and architecture diagrams.

  The problem: Between 2009 and 2012, many Bitcoin wallet tools used predictable entropy — timestamp-seeded LCGs, brain wallets with simple passwords, the Debian OpenSSL bug (CVE-2008-0166) that reduced entropy to 15 bits, the Randstorm BitcoinJS flaw, PHP's mt_rand() seeded with
   time*pid, and others. The private keys generated by these algorithms have tiny search spaces (some as small as 65,536 candidates), and the addresses are still on the blockchain with funded balances.

  The engineering: Everything lives in a single ~3,400-line CUDA C++ file. The per-thread pipeline is:

  private key → secp256k1 EC multiply → SHA-256 → RIPEMD-160 → bloom filter → binary search → match


  Some decisions that might interest the GPU computing crowd:

  - EC multiplication via precomputed lookup tables (67 MB): The 256-bit private key is split into 16 × 16-bit chunks, each indexing into a precomputed table of generator point multiples. This reduces scalar multiplication from ~256 double-and-add iterations to 16 lookups + 15
  point additions.
  - Specialized SHA-256 kernels per pubkey format: Instead of a generic SHA-256, we have two dedicated functions — one for 33-byte compressed pubkeys (single block, partially precomputable message schedule) and one for 65-byte uncompressed (two blocks, second block is constant
  padding). Both build the message schedule directly from EC point coordinates without a serialization buffer.
  - 32 KB bloom filter in __constant__ memory: With ~3,100 target addresses and 100M candidates per batch, a naive binary search would thrash global memory. A bloom filter with 7 hash functions (FNV-1a + rotate-XOR double hashing) in constant memory rejects 99.9999% of candidates
   with broadcast cache reads. Only ~100 false positives per batch reach the binary search confirmation stage in global memory.
  - 256-bit modular arithmetic in PTX inline assembly: secp256k1 operates over a 256-bit prime field, implemented as eight 32-bit limbs with carry propagation. The sparse form of the secp256k1 prime (2^256 - 2^32 - 977) allows fast reduction.
  - __launch_bounds__(256, 2) + --maxrregcount=128: Targeting 2 blocks per SM with 128 registers per thread. Higher occupancy isn't better here because the kernel is compute-bound with heavy ALU usage — 2 blocks provides enough ILP to hide latency without forcing register spills.
   The Mersenne Twister mode (2.5 KB state per thread) is the exception and occasionally spills.

  There are 23 vulnerability modes total, covering LCG PRNGs (glibc, MSVC, Borland Delphi, Java), brain wallet patterns (SHA-256 of counters, timestamps, phone numbers, short ASCII strings), MT19937, V8 JavaScript PRNGs, and the Debian OpenSSL disaster. The smaller modes (Debian:
   65K candidates) complete in milliseconds; the largest (SHA-256 of ASCII strings 1-8 chars: ~2T) takes days across multiple GPUs.

  The distributed side uses a FastAPI backend that assigns work units, verifies results via deterministic checkpoint regeneration (server independently recomputes the key from mode+offset using Python), and injects canary targets (honeypot hashes where the server knows the
  private key) to detect cheating workers.

  Happy to answer questions about the CUDA optimization, the secp256k1 implementation, or the weak entropy patterns.

Comments

yungdeist•1h ago

NIce i have a spare 1080 ti i will contribute

How One Rock Poisoned (Almost) The Entire Planet

Speed Is the Moat: Inference Performance on AMD GPUs

The Future of VSP Scrolling on the C64

Show HN: Writing a C++20M:N Scheduler from Scratch (EBR, Work-Stealing)

Yellow Journalism

You Want It Darker?

Add schedule pause periods to triggers page

The mysterious symptom popping up in some GLP-1 users

Show HN: Air – Open-source black box for AI agents (tamper-evident audit trails)

LayerV – Senior Software Engineer – Remote (US) – $125k–150k equity and benefits

pg_background: Make Postgres do the long work (while your session stays light)

Skunk mating season becoming a headache for Bay Area residents

The Great Reboot

Show HN: Privatiser – Redact secrets, IPs, and PII before sharing with AI

Local memory for any LLM agent

Gravity Basins (2024) [video]

Magic Words Need Measuring Sticks

Godot veteran says 'AI slop' pull requests have become overwhelming

I Use Obsidian

Ask HN: Are compiler errors for unused code necessary?

Memories Family

Book a Meeting with a YC Founder

Ask HN: Can AI replace apps, or will economics keep the app market alive?

Show HN: Preference-aware routing for OpenClaw via Plano

The Servo project and its impact on the web platform ecosystem

Mira: An agent that never forgets anything. Persistent, shared memory

Python HTTP server using Erlang and BEAM

Dual nationals face scramble for UK passports as new rules come into force

GraphQLite: SQLite graph extension supporting Cypher

Show HN: AccessLint – Static accessibility analysis for iOS/Swift