Show HN: Ghost Engine – generate weights on the fly

https://github.com/sajanlamsal/ghost-engine

1•saznlamsal•2w ago

Hello HN, I’m the author of Ghost Engine.

I built this to challenge the assumption that we are strictly bound by the "Memory Wall." My hypothesis was that modern consumer silicon (like Apple M-series) has enough spare compute to decompress weights procedurally faster than it can read them from RAM.

The Architecture: It uses a "Predator-Prey" method:

Predators: We identify and preserve the high-magnitude outliers (the "Alpha" weights) in FP16.

Prey: The remaining weights are compressed into ternary masks {-1, 0, 1} and a block-wise scalar.

Reconstruction: A bitwise kernel reconstructs the layer in L2 cache during the forward pass.

Results: I validated this on Llama-3-8B (Layer 20, SwiGLU).

Compression: ~3.0 bits per weight (effective).

Fidelity: 0.915 Cosine Similarity (weights) / 0.912 (outputs).

Size: Brings an 8B model down to ~3GB.

The repo is currently a "Proof of Engine" in Python/MLX. The math works, but to realize the theoretical speed gains (125 t/s), I am working on porting the decompression kernels to Metal/CUDA.

Happy to answer questions about the compression logic or the "Predator" selection algorithm!

Comments

saznlamsal•2w ago

Hello HN, I’m the author of Ghost Engine.

The Architecture: It uses a "Predator-Prey" method:

Predators: We identify and preserve the high-magnitude outliers (the "Alpha" weights) in FP16.

Prey: The remaining weights are compressed into ternary masks {-1, 0, 1} and a block-wise scalar.

Reconstruction: A bitwise kernel reconstructs the layer in L2 cache during the forward pass.

Results: I validated this on Llama-3-8B (Layer 20, SwiGLU).

Compression: ~3.0 bits per weight (effective).

Fidelity: 0.915 Cosine Similarity (weights) / 0.912 (outputs).

Size: Brings an 8B model down to ~3GB.

The repo is currently a "Proof of Engine" in Python/MLX. The math works, but to realize the theoretical speed gains (125 t/s), I am working on porting the decompression kernels to Metal/CUDA.

Happy to answer questions about the compression logic or the "Predator" selection algorithm!

Goto Considered Awesome [video]

Show HN: I Built a Free AI LinkedIn Carousel Generator

Implementing Auto Tiling with Just 5 Tiles

Open Challange (Get all Universities involved

Apple Tried to Tamper Proof AirTag 2 Speakers – I Broke It [video]

Show HN: Vibe as a Code / VaaC – new approach to vibe coding

Show HN: More beautiful and usable Hacker News

Toledo Derailment Rescue [video]

War Department Cuts Ties with Harvard University

Show HN: LocalGPT – A local-first AI assistant in Rust with persistent memory

A Bid-Based NFT Advertising Grid

AI readability score for your documentation

NASA Study: Non-Biologic Processes Don't Explain Mars Organics

I inhaled traffic fumes to find out where air pollution goes in my body

X said it would give $1M to a user who had previously shared racist posts

155M US land parcel boundaries

Private Inference

Font Rendering from First Principles

Show HN: Seedance 2.0 AI video generator for creators and ecommerce

Wally: A fun, reliable voice assistant in the shape of a penguin

Rewriting Pycparser with the Help of an LLM

Lobsters Vibecoding Challenge

E-Commerce vs. Social Commerce

Avoiding Modern C++ – Anton Mikhailov [video]

Show HN: AegisMind–AI system with 12 brain regions modeled on human neuroscience

Zig – Package Management Workflow Enhancements

AI-powered text correction for macOS

AppSecMaster – Learn Application Security with hands on challenges

Fibonacci Number Certificates

AI Overviews are killing the web search, and there's nothing we can do about it