frontpage.

Hi HN,

When the Mamba-3 paper dropped (currently under ICLR review), I wanted to understand how the new math actually worked. But the official implementations of these architectures usually rely on heavily optimized custom Triton/CUDA kernels. They are incredibly fast for training, but almost impossible to read if you just want to understand the matrix math.

I spent the last few days reverse-engineering the paper to build mamba3-minimal: a pure-PyTorch, single-file implementation that runs natively on Mac (MPS), CPU, and CUDA.

It implements the three core innovations of the paper without any C++:

1. Trapezoidal Discretization & The "Shift" Hack: The new discretization rule introduces a strict sequential dependency across chunk boundaries (the beta term), which breaks standard PyTorch chunking. I managed to solve this by shifting the sequences at the global level before passing them into the chunked State Space Duality (SSD) algorithm.

2. Complex-Valued SSMs (Data-Dependent RoPE): Mamba-2 famously failed at state-tracking (scoring ~50% random guessing on parity tasks). This repo includes a test script proving the RoPE fix mathematically works, hitting 100% accuracy and extrapolating to length 64.

3. MIMO (Multi-Input Multi-Output): Standard decoding is memory-bound. I implemented the rank-expansion formulation from Appendix D, which shifts the state update from a memory-bound outer product to a compute-bound matrix multiplication, all through clean einsum operations.

The repo includes self-tests proving the O(T) chunked training pass produces the exact same logits as the O(1) sequential autoregressive step (max_diff < 1e-6).

If you've been wanting to read the math behind Mamba-3, I built this to be a readable Rosetta Stone. Would love any feedback on the implementation or the PyTorch optimizations!

Andrej Karpathy: agentic AI coding has changed the world unrecognizably

Show HN: Penclaw.ai hire OpenClaw tenant for pentesting

UIQuarter – static analysis CLI for UI codebases

I vibe coded my dream macOS presentation app

Show HN: ClawMoat – Open-source host-level security for AI agents

How Expensify's OSS program is powering SWE-Lancer

Dear Back End Software Engineers: UX Is Your Job Too

Migrate to Vercel from Cloudflare

Show HN: Wikilangs Games – Wordle-like for 300 Languages

The world of hard power and the future of war against Ukraine

Game theory meets lattice gases and spin-glasses: Zero-player Entropy Game

Ask HN: Are "% improvement" stats in resumes an AI indicator?

Show HN: Chess960v2 – Over 100 Rounds Played (chess960v2.com)

Code Red for Humanity

Large-Scale Online Deanonymization with LLMs

Sprites: Stateful sandbox environments with checkpoint and restore

A gut-liver lipid flux checkpoint mediates FAHFA protection from MASLD

Anthropic Dials Back AI Safety Commitments

Wearable trackers can detect depression relapse weeks before it returns: study

Show HN: My focus had a pattern. I built a macOS app to make it visible

Is Perplexity's new Computer a safer version of OpenClaw?

Hexagon-MLIR: An AI Compilation Stack for Qualcomm's NPUs

CHICKEN Scheme

uf

An AI agent on an ESP32 that can automate sensors, relais, speak NATS, Telegram

Thoughts on Forth Programming

Computer History Museum Recovers Rare Unix History

Watching a Robotics Startup Die from the Inside

TranslateGemma now runs 100% in the browser on WebGPU with Transformers.js v4

What Holds America Together?

Show HN: Mamba3-minimal – PyTorch implementation of Mamba-3